CONTRIBUTIONS  TO  THE  THEORY  OF  ROBUST  REGRESSION 


BY 

ROBERT  ALLEN  WESLEY 


TECHNICAL  REPORT  NO.  2 
DECEMBER  1977 


U.S.  ARMY  RESEARCH  OFFICE 
RESEARCH  TRIANGLE  PARK,  NORTH  CAROLINA 
CONTRACT  NO.  DAAG29-76-G-0213 


DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

DEC  1977 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-1977  to  00-00-1977 

4.  TITLE  AND  SUBTITLE 

Contributions  to  the  Theory  of  Robust  Regression 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Stanford  University, Department  of  Statistics, Stanford, CA, 94305 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

1 

1 

16.  SECURITY  CLASSIFICATION  OF: 


a.  REPORT 

unclassified 


b.  ABSTRACT 

unclassified 


c.  THIS  PAGE 

unclassified 


17.  LIMITATION  OF 
ABSTRACT 

Same  as 
Report  (SAR) 


18.  NUMBER 
OF  PAGES 

90 


19a.  NAME  OF 
RESPONSIBLE  PERSON 


Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


CONTRIBUTIONS  TO  THE  THEORY  OF  ROBUST  REGRESSION 


BY 

ROBERT  ALLEN  WESLEY 


TECHNICAL  REPORT  NO.  2 
DECEMBER  1977 


U.S.  ARMY  RESEARCH  OFFICE 
RESEARCH  TRIANGLE  PARK,  NORTH  CAROLINA 
CONTRACT  NO.  DAAG29-76-G-0213 


DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA 


ACKNOWLEDGEMENTS 


There  are  many  people  to  whom  I  am  greatly  indebted.  I 
would  first  like  to  thank  my  parents  for  all  of  their  encouragement 
and  help.  To  Dr.  Bentley  I  wish  to  extend  my  thanks  for  his  friend¬ 
ship  and  for  introducing  me  to  statistics.  Without  the  patience, 
advice,  and  good  humor  of  my  advisor.  Professor  M.V.  Johns,  this 
thesis  would  never  have  been  completed.  Lastly  I  wish  to  thank  my 
wife,  Margaret  Nakamura  Wesley,  for  her  support,  her  ideas,  and  her  love. 


iii 


TABLE  OF  CONTENTS 


PAGE 

List  of  Symbols  and  Notation.  .......  .  v 

CHAPTER  1  INTRODUCTION 

1.1  Description  of  the  Problem . 1 

1.2  Techniques  for  Regression  Estimation  .  4 

1.3  Jaeckel's  Estimator . 9 

1.4  Outline  of  Results . 15 

CHAPTER  2  CONSISTENCY  OF  JAECKEL'S  ESTIMATOR  FOR 

NONMONOTONE  SCORES 

2.1  Model  and  Assumptions . 17 

2.2  Consistency  Proof . 19 

2.3  Counterexample  .  . . 31 

2.4  Comments  on  Paper  of  Stigler . 34 

2.5  Counterexamples . to  Stigler  .  38 

2.6  Proofs  of  Corrected  Results.  .  42 

2.7  Miscellaneous  Results . .50 

CHAPTER  3  ADAPTING  JAECKEL'S  ESTIMATOR 

3.1  Adaptive  Estimators  and  the  Kink  Family.  .  .  .  55 

3.2  Assumptions  and  Bickel-Rosenblatt  Result  ...  58 

3.3  Preliminary  Results . 61 

3.4  Asymptotics  for  Adaptive  Estimator  .  73 

CHAPTER  4  FUTURE  WORK 

4.1  Possible  Extensions . 78 

BIBLIOGRAPHY . 81 


LIST  OF  SYMBOLS  AND  NOTATION 


For  the  sake  of  reference  we  record  some  of  the  more  common 
notation  and  symbols  used  in  the  paper.  For  those  symbols  whose 
use  in  specific  to  this  paper,  we  also  record  in  parentheses  the 
page(s)  on  which  they  are  first  used  or  where  their  definition  may 
be  found. 

Standard  Notation 

In  the  background  there  is  a  probability  space  (fi,A,P)  on 
which  all  random  variables  are  defined.  The  elements  of  Q  are  denoted 
to.  For  a  random  variable  W,  E(W)  refers  to  the  expected  value  of  W. 
Two  random  variables  are  of  note:  N(0,1)  is  a  normal  random  variable 
with  mean  0  and  variance  1;  U(a,b)  is  a  random  variable  uniformly 

distributed  on  (a,b).  If  is  a  sequence  of  random  variables, 

D  s  .... 

then  Zn  a-  Z  means  that  the  Zn  converge  m  distribution  to  the 

random  variable  Z.  For  a  function  a(x),  a~(x0)  is  notation  for 

lim  a(x)  and  a+(x0)  =  lim  a(x).  w.p.l  (with  probability  one) 
xtxQ  x4-xQ 

and  a.e.  (almost  everywhere)  are  equivalent  notations.  R  and  Rm 
are  one-dimensional  and  m-dimensional  Euclidean  space  respectively. 

If  A  is  a  set,  I^(x)  is  the  indicator  function:  I^(x)  =  i  if 
x  e  A  and  0  if  x  i  A.  If  X^,  ...,  X  are  random  variables, 
the  order  statistic  is  -  ^(2)  -  *  *  *  -  ^(n)‘  Lastly,  if  A 

and  B  are  two  sets,  A  B  is  the  set  difference  defined  to 
equal  A  ft  Bc. 
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Symbols  (Greek  letters  follow  the  Roman  letters) 


A:  aN(-)  (8,10,17);  a  (•)  (41,44);  A  (59);  a(t)  (61,67); 

a*( • )  (71). 

B:  B  (18);  B  (18);  B°  (18);  Bj  (18);  b(N)  (60);  Bw  (59); 

B  ,  (59);  b*(0  (71). 

w! 

C:  ci  (1);  (1);  c  *  (1);  cj  (2);  Cj  (19);  C  (10); 

C(u)  (25). 

D:  DN(-)  (10,17). 

E:  e.  (1);  e?  (12);  e,?.  (12);  e.  (59). 

i  i  (i)  3 

F:  F  (1);  f  (11);  F^N  (21);  Fg  (21);  Fln,  ...,  Fnn  (35); 

F  (38,42);  Fn  (42);  %  (59);  fN  (59);  F~X  (61);  F^1  (69). 

G:  G(y)  (21,42). 

H:  H( • )  (18);  h  (32);  Hn  (40);  H*  (40);  Hn  (44). 

I:  1(f)  (18);  In  (44);  Iln,I2n  (44);  I  (61);  I*  (62). 

J:  J(u)  (8,17);  (56). 

K:  Kg  (22);  K  (42). 

L:  L15L2  (47). 

N:  N  (1). 

P:  pn  (50,53). 

Q:  q  (1). 

S:  S„  (20);  S  (35). 

N  n 

T:  tl5  ...,  tp  (18,43);  T,  ,  T2  (64). 

U:  U(?)  (76). 
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V:  v(0  (57);  %(?)  (73);  V(?),  VNU)  (76). 

W:  w(x)  (59). 

X:  Xln,  ...,  Xnn  (35);  X(i)  (35);  X  (50). 

Y:  Y.  (1,2,3,17);  Y  (10);  Y  (50). 

l 

Z;  Zkn  (38). 

a:  a  (as  a  parameter)  (1,2);  a.  (as  trimming  proportion)  (18); 

aQ  (10). 

6:  8,  8*  (1);  8  (2);  8Q  (10,17);  8N  (17);  8N  (58);  8*  (73). 

e:  (18,49). 

C  (62) 

y:  y(J,F^  n>  (23)’  ^N(e)  (23)»  4(J,Fg)  (23);  y(8)  (23); 

y(j,F)  (36). 

£:  q  (61);  50  (73);  S  (73). 

ct:  Z  (19);  a2(J,Fg,K)  (22);  a2  (22). 

<j>:  <t>f  (ID. 
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CHAPTER  1.  INTRODUCTION 

1 . 1  Description  of  the  Problem 

The  general  problem  we  will  be  concerned  with  in  this  paper 
is  that  of  estimating  the  regression  parameters  a, g. ,...,3  in  the 

-  1  q 

general  regression  model 


(1.1) 


Y.  =  a  +  c .  g  +  e . 

i  i 


i=  1,2,. ..,N, 


■  1. 

where  Y.  is  the  iT  observation  on  the  dependent  variable,  a  is  the 

intercept  parameter,  g  is  a  column  vector  of  slope  parameters  with 

ST=( 3  ,...,6  )  (T  denotes  transpose),  cT  =  (c.  ,...,c.  )  is  the  vector 
1  q  ~i  il  lq 

t  h 

of  regression  constants  associated  with  the  i  1  observation,  and  e^ 
is  the  random  error  associated  with  the  i^*1  observation.  A  formulation 
equivalent  to  (1.1)  which  is  sometimes  more  convenient  is 


(1.2)  Y.  =  c*T§*  +  e.  , 

x  x  x 

t  t  t  T 

where  now  c*  =  (l,c  ,...,c.  )  and  g*  =  (a,g  ,...,g  ).  Throughout 
i  il  iq  1  q 

we  will  assume  that  the  (e^: i=l , . . . ,N  }  are  independent,  identically 
distributed  (iid)  random  variables  (rvs)  with  cumulative  distribution 
function  (cdf)  F(x),  which  is  symmetric  but  possibly  far  from  the 
normal  distribution. 

More  specifically,  throughout  most  of  the  paper  we  will  consider 
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the  simpler  problem  of  estimating  a, 3  in  the  simple  linear  regression 
case,  for  which  the  model  (1.1)  becomes 

(1.3)  Y.  =  a+gc.+e.  ; 

l  ii 

now  B  and  c^  are  scalars.  Besides  simplicity,  another  reason  for 
considering  (1.3)  was  pointed  out  by  Huber  in  [l5j  : 

"Note  that  the  simple  straight  line  regression  problem 
is  basic;  if  we  know  how  to  treat  this,  we  can  in  prin¬ 
ciple  attack  the  general  regression  problem  by  considering 
one  parameter  at  a  time,  keeping  the  others  fixed  at  trial 
values . " 

The  main  feature  of  note  in  models  (1.1)  and  (1.3)  is  that  we 
do  not  assume  that  the  random  errors  {e.}  are  normally  distributed. 
Indeed  we  will  not  assume  we  know  the  form  of  the  distribution  function 
F.  In  this  context  the  problem  of  estimating  the  regression  parameters 
becomes  a  problem  of  robust  estimation  and  shares  many  features  with 
the  problem  of  robust  estimation  of  a  location  parameter,  which  has 
been  considered  extensively  in  the  last  decade  (cf.  [3]  ,  [ll|  ,  [12]  , 

[13]  ,  [14]  ,  [16]  ,  [17]  ,  [l9]  ).  Here  we  use  the  word  "robust"  to  refer  to 
statistical  procedures  good  for  a  broad  class  of  possible  underlying 
models.  Asymptotically  this  can  be  viewed  as  demanding  high  absolute 
(asymptotic)  efficiencies  for  all  suitably  smooth  shapes.  (For  this 
and  other  approaches  to  "robustness",  consult  [l2]  and  [l4] . ) 
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In  the  location  problem  the  model  in  which  we  are  interested 
is 

(1.4)  Y.  =6+  e. 

i  l 

where  9  is  the  unknown  location  parameter,  Y.  is  the  observation,  and 

i 

the  {e.}  are  iid  random  variables  with  common  cdf  F  assumed  to  be 

l 

symmetric  around  0.  The  simple  linear  regression  model  of  (1.3)  is 
one  of  the  simplest  gneralizations  of  this  problem;  both  the  regression 
models  (1.3)  and  (1.1)  include  the  location  model  as  a  special  case 
and  are  important  in  practical  applications. 

For  the  location  problem  three  different  classes  of  estimators 
of  0  have  been  proposed:  L  estimators,  M  estimators,  and  R  estimators. 
Briefly  summarizing:  the  L  estimators  are  linear  combinations  of  the 
ordered  observations  Y^^  ^(2)^  ‘"'^^(N)’  "^e  ^  estimators  are 
analogues  of  maximum  likelihood  estimators,  with  the  estimator  9  of 
9  satisfying 


N 

(1.5)  2p(Y-9)=  minimum, 

i=l  i 


or  the  equation 


(1.6) 


Z  jp(Y.~  9)  =  0, 
i=l  1 


where  p  is  usually  a  convex  function  and  ip=p ’  ;  R  estimators,  such 
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as  the  well-known  Hodges-Lehmann  estimator  (cf.  [ll] ) ,  are  derived 
from  rank  tests.  For  the  problem  of  estimating  regression  parameters, 
each  of  these  three  classes  of  estimators  has  been  generalized.  In 
the  next  section  we  will  consider  these  generalizations  after  briefly 
reviewing  the  classical  technique  of  estimation  for  location  and 
regression  problems  —  the  method  of  least  squares  —  and  some  of  its 
history. 

1.2  Techniques  for  Regression  Estimation 

1.2.1  In  the  history  of  statistics,  the  problem  of  estimating 
regression  parameters  is  a  very  old  problem,  and  a  number  of  techniques 
have  been  developed  for  dealing  with  it.  The  classical  technique  of 
estimation  —  the  method  of  least  squares  due  to  Gauss  and  others  — 
was  developed  by  the  early  nineteenth  century.  The  motivation  for 
least  squares,  as  described  by  Huber,  is  interesting: 

"The  original  motivation  for  this  method  (due  to  Gauss)  is 
somewhat  circular:  least  squares  estimates  are  optimal  if 
the  errors  are  independent  identically  distributed  normal; 
on  the  other  hand.  Gauss  assumed  a  normal  error  law  because 
then  the  sample  mean,  which  'is  generally  accepted  as  a  good 
estimate,’  turns  out  to  be  optimal  in  the  simplest  special 
case...."  (p.  799  of  [l5j;  also  see  p.  1042  of  [14]). 

The  linchpin  in  this  justification  is,  of  course,  the  faith  placed 
in  the  arithmetic  mean.  It  is  interesting  to  contrast  the  historical 
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dominance  of  least  squares  theory  with  the  number  of  alternative 
techniques  suggested  over  those  150  years  which  never  gained  prom¬ 
inence  and  which  have  only  recently  been  resurrected.  .  (For  a  history 
of  these  procedures  and  others,  see  [28].)  Already  in  1818  Laplace 
proposed,  for  estimating  regression  through  the  origin,  minimizing 
the  sum  of  absolute  residuals  rather  than  the  sum  of  squared  residuals. 
In  the  1840s,  in  a  paper  criticizing  Gauss'  original  justification 
for  least  squares,  Ellis  proposed  essentially  what  are  now  called 
M  estimators. 

In  the  last  several  decades ,  the  prominent  position  of  least 
squares  and  the  accompanying  classical  normal  theory  has  been  vigorously 
questioned.  One  of  the  most  recent  assessments  of  the  possibility 
of  poor  performance  of  classical  least  squares  theory  comes  from  the 
Princeton  robustness  study  (see  []3]),  which  evaluated  dozens  of 
different  estimators  in  the  simple  location  problem.  Their  answer 
to  the  question  "Which  was  the  worst  estimator  in  the  study?"  was: 

"If  there  is  any  clear  candidate  for  such  an  overall  statement,  it 
is  the  arithmetic  mean...."  (p.  239  of  []3]). 

1.2.2  As  was  noted  earlier,  for  the  problem  of  estimating  parameters 
in  the  regression  models  (1.1)  and  (1.3),  a  number  of  alternatives 
to  the  method  of  least  squares  have  been  developed.  One  such  estimator 
for  simple  linear  regression  was  originally  proposed  by  Theil  in 
1950  []30]  and  later  elaborated  by  Sen  Q27]],  who  derived  its  asymptotic 
properties.  Based  on  a  rank  test,  the  Sen-Theil  estimator  provides 
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an  estimate  of  only  the  slope  parameter  3-  In  the  simplest  case 

where  all  the  c  are  different,  the  estimate  of  3  is  simply  the  median 
i 

of  the  I  N)  slopes  (Y.-  Y.)/  (c  -  c.)  joining  pairs  of  points. 

[21  1  1  j  1 

There  are  also  the  estimators  constructed  to  be  analogues  to 
the  L,  M,  and  R  estimators  for  location.  The  most  intensely  studied 
class  of  analogues  is  the  M-type  estimators  —  papers  on  this  class 
include  ones  by  Relies  [26]  ,  Huber  £15]  ,  Andrews  [2]  ,  Bickel  [p]  ,  and 
Yohai  [31J •  In  this  case  (1.5)  generalizes  to:  the  estimator  3*  of 
3*  is  that  vector  which  causes 

N  T  . 

(1.7)  £  p(Y.-  c*  3*)  =  minimum. 

i=l  1  .  ~ 


Obviously  one  member  of  this  class  is  the  least  squares  estimator: 
take  p(x)=  x^.  A  more  robust  proposal  for  p  given  by  Huber  in  [is] 
is 


(1.8) 


p(x)  =/ 


1  2 
i.  x 

2 

cl  x 


X  <c 


-A  c  I  x|  >c 


where  c  is  a  constant.  An  entire  family  of  M  estimators  is  defined  by 


(1.9)  p ( x )  =  | x | a  for  l«a^2. 

This  family  contains  both  the  least  squares  estimator  (a=2)  and  the 
estimator  proposed  by  Laplace  (a=l).  Forsythe  [9]  and  others  have 
studied  members  of  this  family  and  the  family  as  a  whole. 
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In  the  published  literature  there  appears  to  be  only  one  ana¬ 
logue  of  an  L  estimator  for  regression,  that  of  Bickel  Qf]  .  The 
analogy  with  the  L  estimates  for  location  is  more  tenuous,  however, 
than  is  the  straightforward  analogy  obtaining  for  M  estimates.  Unlike 
the  L  estimate  for  location,  Bickel’ s  estimator  requires  a  preliminary 
"reasonable"  estimate  of  g*  in  order  to  form  a  type  of  one  step 
improvement.  On  the  other  hand  the  asymptotic  results  Bickel  derives 
for  his  analogue  are  identical  (except  of  course  for  the  dependence 
on  the  design  matrix  (c  ) )  to  those  which  hold  in  the  location  case. 

ij 

An  interesting  special  case  of  Bickel 's  estimator  is  the  analogue  to 
the  trimmed  mean:  the  residuals  derived  from  fitting  the  model  with 
the  preliminary  estimate  are  ordered  and  a  "position  index"  is  associated 
with  each;  one  trims  those  observations  leading  to  residuals  with 
extreme  position  indices  and  then  forms  the  standard  least  squares 
estimate  from  the  remaining  observations  (cf.  pp.  599-601  of  Qf|  for 
details) . 

Several  estimators  have  been  developed  for  the  regression  problem 

based  on  rank  tests.  We  have  already  mentioned  Sen-Theil;  in  addition 

there  are  R-type  estimators  proposed  by  Adichie  [jf]  ,  Koul  [22]  , 

Kraft  and  van  Eeden  [23]  ,  JureSkova  Q2lTJ  ,  and  Jaeckel  [l8]  .  The  oldest 

of  these  proposals  is  that  of  Adichie,  who  is  concerned  with  estimation 

in  the  simple  linear  regression  model.  His  estimators  a  and  g  are 

constructed  in  a  manner  very  similar  to  that  of  the  Hodges-Lehmann 

estimator  for  shift:  one  forms  rank  test  statistics  T  and  T  for 

1  2 
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testing  hypotheses  on  a  and  g  respectively,  and  then  by  "inverting" 

these  tests  one  derives  the  values  for  a  and  g  (cf.  pp.  895-896  of 

Qfj  for  details).  There  is  an  error  in  Adichie's  paper  which  will 

assume  some  importance  for  us  later:  for  Adichie's  methods  of  proof 

to  work,  a  necessary  condition  on  the  score  function  \p  (u)  used  to 

0 

generate  the  rank  statistic  is  missing  in  the  assumptions  for  the 

asymptotic  normality  of  g  (pp.  894-895  and  p.  898).  Specifically 

he  needs  to  assume  that  iji  be  monotone  increasing  to  insure  that  his 

conditions  (A)  and  (B)  (p.  895)  obtain. 

For  the  general  regression  problem,  Jureckova  [2l]  considered 

generalizations  of  the  rank  tests  used  by  Adichie  in  order  to  define 

her  estimates  of  the  regression  parameters.  Her  basic  approach  is  to 

first  estimate  the  vector  g,  and  then  to  estimate  a  using  a  location 

estimate  on  the  resulting  residuals.  For  estimating  g  she  defines, 

for  the  sample  (Y  , ...,Y  )  and  a  score  generating  function  J(u),  the 
1  N 

rank  statistics 


(1.10) 


S  .  (b)  =  N  ? 

N:  ~  1=1 


N  _  h 
E  (c  -c)a  (R-r) 
ij  j  N  i 


j -1 , . . . ,q, 


_  N  ip 

with  c.  =  N  E  c..  ,  b  =  (b  ,...,b  ),  a  (k) 
9  i=l  il  ~  1  q  N 


=  J  .JL 
\N+1 


and 


b  b  T 

(R~,...,R~)  is  the  vector  of  ranks  corresponding  to  the  variables 
1  N 

rp 

Y  -  c  b,  i=l,...,N.  (We  note  that  in  the  case  of  simple  linear 
i  ~i  ~ 

regression,  q=l,  Jureckova 's  rank  statistic  corresponds  to  Adichie's 
rank  statistic  T2«)  Jureckova 's  estimate  B  of  g  is  then  any  value  of 
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b  which  minimizes 


q 

(1.11)  t  | S  ( b )  | 

j=l  N]  - 

For  the  case  q=l  this  definition  leads  to  an  estimate  of  g  slightly 
different  from  Adichie's.  Another  estimator,  closely  related  to 
Jureckova's  but  of  a  different  motivation,  is  discussed  next. 


1 . 3  Jaeckel's  estimator 

1.3.1  Jaeckel's  method  of  estimation  is  concerned  exclusively  with 

estimating  the  vector  of  regression  parameters  0  in  the  general  regression 

model  —  the  technique  does  not  directly  admit  an  estimate  of  the 

intercept  a;  however,  as  in  Jureckova's  case,  after  the  estimate  0 

is  computed,  a  may  be  estimated  by  applying  a  robust  estimate  of 

T~  N 

location  to  the  residuals  {Y^-  c.g}.  .  The  starting  point  for  defining 

Jaeckel's  estimate  is  a  measure  of  the  dispersion  of  a  set  of  numbers; 
given  this  measure  of  dispersion,  one  constructs  the  residuals  using 
the  different  possible  values  of  g  and  chooses  as  one's  estimate  the 
vector  minimizing  the  dispersion  of  the  residuals.  This  procedure  has 
many  elements  in  common  with  the  M-estimators  discussed  earlier. 

Indeed,  if  for  example  one  were  to  define  the  dispersion  of  the 


T  N  2 

vector  z  =  (z.,  ,  . . . ,zM)  as  £  z.,  then  the  procedure  outlined  above 
l  in  1 

simply  leads  to  the  least  squares  estimates.  What  differentiates 
Jaeckel's  procedure  from  the  M-estimators,  however,  is  his  definition 
of  the  dispersion.  Jaeckel  defines  the  dispersion  function  D  (z)  as 
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(1.12) 


D  (z)  =  £  a  (k)  z 

N  ~  k=l  N  (k) 


where  z  ^  Z(2)^  ^Z(N)  an<^  ^a^k)}  are  3  set  scores 


satisfying 


N 

£  a  (k)  =  0. 
k=l  N 


Thus  for  a  vector  of  observations  Y  =  (Y  ,  ...,Y„)  and  vector  b  and 

1  N 

design  matrix  C=  (c..),  the  dispersion  is 


(1.13) 


V?  -  <%>  ■  JL  Vk>  &i(k>  -  ci(k) 


where  the  bracketed  quantity  is  notation  for  the  k  ordered  residual. 

The  estimate  of  ft  is  then  any  value  of  b  which  minimizes  D  (Y-Cb). 

c  ~  N  ~ 

(Note  on  notation:  In  the  remainder  of  the  paper, 
we  will  distinguish  the  true  value  of  the  parameter 
g  (or  g)  by  denoting  it  as  gQ  (or  gQ);  similarly 
aQ  will  denote  the  true  value  of  a  —  so  model 
(1.3)  reads 

(1.3)  Y . =  an  +  gnc.  +  e.  , 

i  0  0  i  i 

and  so  forth.  Also,  when  no  confusion  will  arise, 

we  will  shorten  D  (Y  -  Cg)  to  D  (g)  to  emphasize 
N  -  N  ~ 

the  dispersion  as  a  function  of  g.) 


1.3.2  There  are  several  motivations  for  Jaeckel’s  estimator.  The 
first  is  that  incorporated  into  the  title  of  Jaeckel's  paper:  the  idea 
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of  minimizing  the  "dispersion"  of  the  residuals.  This,  as  we  noted, 

is  the  same  idea  as  involved  in  least  squares  and  all  M  estimators : 

all  Jaeckel  has  done  is  to  choose  another,  but  still  intuitively 

appealing,  formula  for  measuring  how  far  a  hypothesized  line  (or 

hyperplane  in  general)  is  from  a  set  of  observed  values  (c^,Y 

(c„t,Y  )  (or  C  and  Y  in  general). 

N  N 

The  second  motivation  is  not  so  apparent.  But  as  Jaeckel  proved, 
the  asymptotic  behavior  of  his  estimator  and  that  of  Jure^kova's  is 
identical.  This  coincidence,  it  turns  out,  derives  from  the  fact  that, 

g 

when  it  exists,  the  derivative  of  D  (g)  with  respect  to  the  g.  is 

N  ~  D 

(except  for  a  constant)  equal  to  the  set  of  Jureckova's  rank  statistics 

.  And  so  the  good  results  for  the  asymptotic  behavior  of  Jureckova's 

estimator  carry  over  to  Jaeckel' s. 

A  third  possible  motivation  for  Jaeckel 's  estimator,  which  he 

did  not  mention,  is  related  to  the  asymptotic  behaviors  (as  functions 

of  g)  of  the  dispersion  and  of  the  log  likelihood.  We  briefly  consider 

a  heuristic  derivation  of  this  relationship.  We  note  that  Jaeckel 's 

results  and  proofs  are  unrelated  to  these  heuristics  and  do  not  provide 

any  indication  of  under  what  conditions  they  can  be  formalized.  Suppose 

that  the  errors  {e.}  are  iid  with  cdf  F  and  density  F'=f.  We  assume 

1 

we  use  the  scores  generated  by  the  function  <j>^(u)  =  F  '*‘(u)| 

(i.e.  a  (k)  =  <j>  ,  k=l,...,N);  for  errors  with 'cdf  F  this  turns 

N  .  f  \N+1 ) 

to  be  the  optimal  choice  under  certain  restrictions.  For  simplicity 
we  take  q=l  (simple  linear  regression);  and  since  the  dispersion  can 
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only  be  used  to  estimate  ft  we  take  a  =  0.  Then  letting  L  (g;Y)=  L  (3) 

0  0  N  ~  N 

be  the  likelihood  based  on  N  observations,  we  have 


(1.14) 


log  L  ( 3)  = 
N 


E  log  f(e.  ) 
i=l  1 


N 


i=i iog 


g  3  th  N 

where  e.  =  Y.-3c.  and  e...  is  the  i  ordered  value  among  (Y  -3c  } 

1  1  1  (1)  k  k  k=l 

We  consider  the  behavior  of  log  L  (3)  in  the  vicinity  of  the  true 

N 

-1 


value  3,. 


.3 


e^j  =  F  (i/(N+l));  expanding 


For  3-3  and  N  large,  e, 

«  0  (1) 

g  —1 

log  f(e^\^)  about  F  (i/(N+l))  in  a  Taylor  series  we  get 


(1.15)  log  f(e®. .)=  log  f (F  1(i/(N+l))) 

(1) 

+  [e^)-F_1(i/(N+l))]:  -y-  [F"1(i/(N+1))] 
+  higher  order  terms  ; 


thus  on  summing  we  get 

log  ye>  =  kN  -  D„(6)  +  ye)  , 

where  k„  is  a  constant  independent  of  3  and  Rt(3)  is  the  sum  of  the 
N  N 

higher  order  terms.  Thus  we  see  that,  at  least  locally  at  3q» 

-D^(3)  and  the  log  likelihood  have  similar  asymptotic  behavior  if  we 

use  the  correct  scores  in  defining  the  dispersion.  If  the  global 

behavior  of  D  (3)  is  reasonable,  then  we  might  expect  similar  asymptotic 
N 

behavior  for  the  maximum  likelihood  estimator  and  Jaeckel’s  estimator. 
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A  fourth  motivation  for  Jaeckel's  estimate  relates  to  its 


computational  feasibility.  First,  unlike  several  other  recent  pro¬ 
posals  (cf.  [5],  [23]),  Jaeckel’s  estimator  does  not  require  a  pre¬ 
liminary  estimate  of  g.  Second,  Jaeckel's  estimator  seems  to  be  easier 
to  compute  than  JureXkova's  proposal  [2l] .  As  Jaeckel  points  out, 
if  the  scores  a^Ck)  are  monotone  increasing,  then  the  dispersion  is 
a  continuous,  convex  function  of  g;  also  it  is  straightforward  to 
compute  the  derivatives  of  the  dispersion,  which  exist  almost  every¬ 
where.  Thus  one  can  apply  iterative  methods  for  searching  for  the 
minimum;  Jaeckel  mentions  the  possibility  of-using  the  method  of 
steepest  descent. 

One  of  the  strengths  of  Jaeckel's  estimator  is  its  asymptotic 

performance  as  compared  to  that  of  other  proposed  estimators  in  use. 

As  an  example  Jaeckel  considered  the  simple  linear  regression  case 

and  compared  the  performances  of  the  well-known  Sen-Theil  estimator 

and  his  estimator.  Since  the  distribution  of  the  errors  is  generally 

unknown,  he  chose  for  his  scores  a^(k)  those  optimal  for  the  logistic 

distribution  (Wilcoxon  scores):  aM(k)  =  -  -  .  For  this  choice 

N  N+l  2 

his  estimate  becomes  a  "weighted  median"  of  the  pairwise  slopes 
With  this  set-up  the  asymptotic  variance  of  the  Sen-Theil 
estimator  is  always  greater  than  (or  equal  to)  that  of  Jaeckel's 
estimator. 

1.3. 3  There  are  several  areas  of  weakness  in  the  results  for  the 
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estimators  of  Jaeckel,  Jure£kova,  and  Adichie.  The  first  is  the 
requirement  in  all  three  theories  that  the  scores  be  generated  by  a 
function  <j>(u)  which  is  monotone  increasing.  This  is  an  important 
restriction  since  for  many  choices  of  F,  the  distribution  function  of 
the  errors,  the  "optimal"  choice  of  scores  —  <j>^(u)=  -f'  (F  (u))  — 
is  not  monotone  increasing.  If  F  is  the  Cauchy  distribution,  then 
(j>^(u)=  sin(2iTU-TT) ,  which  is  not  monotone  for  ue[o,l].  Indeed,  an 
easy  calculation  shows  that  if  F  is  a  t  distribution  with  n  degrees 
of  freedom  the  corresponding  <j>^(u)  is  not  monotone  for  any  choice  of 
n.  Also  numerical  results  indicate  that  if  F  is  one  of  a  variety 
of  contaminated  normal  distributions,  then  <j>^.  is  not  monotone  either. 

It  turns  out  that  <j>^(u)  is  monotone  if  and  only  if  F'=f  is  a  so-called 
"strongly  unimodal"  density  (cf.  [l (f]  ) .  Obviously  for  applications 
one  would  like  to  be  able  to  use  non-monotone  scores  and  be  assured 
of  the  asymptotic  performance  of  the  resulting  estimator. 

The  second  deficiency  arises  since,  in  practice,  one  seldom 
knows  the  distribution  function  F.  In  this  case  one  may,  of  course, 
choose  an  omnibus  score  function  <(>,  such  as  the  Wilcoxon  score  function 
mentioned  earlier;  or  if  one  has  some  idea  of  the  shape  of  F,  one  can 
try  to  choose  <j)  which  works  reasonably  well  (although  not  optimally) 
for  all  these  feasible  shapes.  An  alternative  to  choosing  one  specific 
score  function  is  to  choose  the  score  function  (possibly  from  a  given 
family  of  choices)  on  the  basis  of  the  sample  —  that  is  to  make  the 
estimator  adaptive.  In  the  location  problem  using  L  estimators, 
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adapting  has  been  used  quite  successfully:  not  only  has  it  provided 
estimators  which  are  (nearly)  asymptotically  optimal  over  a  large 
nonparametric  family  of  error  distributions,  but  the  estimators  also 
perform  well  in  samples  of  modest  size.  (For  details  on  adapting 
L  estimators,  see  Johns  [19]  . )  It  would  be  very  useful  if  adaptive 
estimators  with  similar  properties  could  be  found  in  the  regression 
problem. 

1 . 4  Outline  of  Results 

The  main  results  of  this  paper  fall  into  three  categories.  The 
first  category  contains  results  related  to  the  asymptotic  consistency 
of  Jaeckel's  estimator  when  non-monotone  score  functions  are  used.  In 
Section  2.2  conditions  are  stated  for  the  problem  of  simple  linear 
regression  under  which  consistency  obtains  together  with  a  proof  of  the 
result,  which  is  similar  in  spirit  to  the  classic  proof  by  Wald  of 
the  consistency  of  maximum  likelihood  estimates.  Section  2.3  contains 
a  counterexample  to  the  consistency  of  Jaeckel's  estimator  based  on 
non-monotone  scores  if  certain  of  the  conditions  on  the  error  distri¬ 
bution,  invoked  in  Section  2.2,  are  not  met.  It  should  be  noted 
that  these  conditions  are  not  necessary  if  one  uses  monotone  scores. 

In  proving  the  results  of  Sections  2.2  and  2.3,  we  utilize 
results  due  to  Stigler  [29]  on  the  behavior  of  linear  combinations 
of  order  statistics.  However  some  of  the  results  stated  in  his  paper 
are  incorrect  and  there  are  errors  and  gaps  in  some  of  his  proofs. 
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The  second  category  of  results,  contained  in  Sections  2.4  through 
2.7,  corrects  these  deficiencies  so  the  results  can  be  used  in  the 
proofs  concerning  consistency. 

The  third  category  of  results,  comprising  Chapter  3,  addresses 
the  difficulty  of  not  knowing  the  true  distribution  of  the  errors 
in  the  general  regression  model.  An  adaptive  estimator  is  proposed 
based  on  a  family  of  Jaeckel-type  estimators  (with  monotone  scores), 
and  results  are  proved  concerning  its  asymptotic  behavior.  These 
results  show  that,  asymptotically  at  least,  one  loses  very  little  in 
not  knowing  the  error  distribution  (if  it  is  strongly  unimodal)  if 
one  uses  this  adaptive  estimator. 
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CHAPTER  2.  CONSISTENCY  OF  JAECKEL'S  ESTIMATOR  FOR  NONMONOTONE  SCORES 

2 . 1  Model  and  Assumptions 

Throughout  this  chapter  we  will  be  concerned  with  simple  linear 
regression  through  the  origin: 


(2.1) 


Y.  =  3c.  +  e. 
l  Ox  l 


where  Y^,...,Y^  are  observations  on  the  dependent  variable,  c^,...,c^ 

are  regression  constants,  3^  is  the  unknown  slope  parameter  to  be 

estimated,  and  e  , ...,e  are  iid  random  variables  with  distribution 
1  N 

function  F.  We  then  define  Jaeckel's  dispersion  function  for  the 
score-generating  function  J(u)  (J:  [0,1]-*  R)  by 


N 


(2.2) 


y8)  - 


3 


£  a  (k)  e" 
k=l  N  (k) 


where  a  (k)  =  J( -  )  and  e,,  'i  is  the  kT  ordered  value  among  the 

N  n+1  '•K' 

residuals  (Y  -gc.  :i=l,...,N},  as  defined  earlier  on  p.12.  Note  that  in 
i  i 

the  definition  of  D  ,  the  dependence  on  the  {Y.}  and  {c  }  has  been 

N  i  i 

suppressed.  Jaeckel's  estimate  of  g^  is  denoted  by  g^  and  is  any  value 


of  g  in  the  parameter  space  B°  satisfying 


(2.3) 


dn(3n)  ^  Ve)  f°r  311  B 
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Assumptions 


We  will  now  summarize  the  usual  assumptions  we  will  invoke 
in  the  course  of  this  chapter. 

FI:  F  is  unimodal  and  symmetric  about  0. 

F2:  F  has  density  f  with  f  ^  B^,  a  finite  constant. 

Also  F  has  finite  Fisher's  information  1(f)  =  /[ f'/f]2f. 

F3 :  (Tail  condition)  For  some  e,  >  0,  lim  x6"1  Ql  -  2F(x)J  =  0. 

Bl:  B°cR  is  a  compact  set;  without  loss  of  generality  take 
B°  =  [-B,B]  . 

Cl:  Define  HN(x)  =  N_1  x  #{c^  ^  x:  i  =  1,  .  ..,  N},  where  #A 

denotes  the  cardinality  of  a  set  A.  Thus  is  the  "sample 

distribution  function"  of  the  {c^:  i  =  1,  ...,  N}.  We 
assume  |  c^  |  .<  Bc  for  all  i,  so  concentrates  all  its 

mass  on  [r®c’  ,  and  that  Hjj  some  H,  a  distribution 

function.  We  assume  the  variance  of  H,  var(H),  is 
strictly  positive. 

Jl:  0  ^  +  u)  =  -J(h  -  u)  for  u  e  [o  ,  with  J(%  +  u)  >  0 

on  some  interval. 

J2:  j J(u) j  ,<  Bj  <  «  for  u  e  [0,l].  Also,  J  satisfies  a  Holder 

condition  with  y  >  h  (i.e.  for  u,  v  e  [o,l]  , 

| J(u)  -  J(v)|  ^  constant  •  |u  -  v|Y  ), 
except  possibly  at  a  finite  set  of  points  t 1 ,  ...,  t  . 

Lastly  J  trims:  for  some  a  e  J(u)  =  0  if 

u  a  or  u  1-a  . 
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Jaeckel's  Theorem 


For  the  sake  of  reference,  we  state  Jaeckel's  asymptotic  normality 
result : 

Theorem  Let  F  have  finite  Fisher's  information  and  suppose  that  J(u) 

is  non-constant,  non-decreasing,  and  square  integrable  on  (0,1)  such 
N 

that  E  a  (k)  =  0.  Then  under  some  technical  assumptions  on  the 


k=l 


N 


{c.}  (cf.  p.  1328  of  [21]  ) 


N2  (g  -  g  )  — *■  N(0,V),  where  g  is  Jaeckel's 

~N  ~0  ~  ~N 


estimator  and 


V  = 


/1(J(u)-J)2du 


[f1  J (u)<J>f (u)  du]2. 


-1 

•  E  ,  with 


J  =  f1  J(u)du,  <j>f(u)  =  -f'/f(F  1(u)),  and  E  =  CCTij]  with 


-1  N 
lim  N  •  E  (c 


kl 


N-*=°  k=l 

If  q=l  (simple  linear  regression). 


c, )(c,  .-  c  . )  and  c . 
Ik]]  ] 


N 


E  c 
k=l 


k  j " 


E  is  a  scalar  equal  to  var(H).§ 


2.2  Consistency  Proof 

In  this  section  we  prove  the  consistency  of  Jaeckel's  estimator 

6  for  the  true  slope  parameter  g  .  Because  of  the  invariance  of 
N  U 

Jaeckel's  estimator  (cf.  p.  1452  of  [l8]),  we  will  assume  without  loss 
of  generality  that  BQ  =  0  in  the  remainder  of  the  paper.  The  method 
of  proof  will  be  to  use  Theorem  7*  of  Section  2.6  to  derive  asymptotic 
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results  about  the  behavior  of  the  dispersion  D  (g)  and  then  use  the 

N 

compactness  of  the  set  B°  of  possible  values  of  the  slope  parameter. 

For  ease  of  reference  we  now  state  a  version  of  Theorem  7* 

2 

which  we  will  use  in  this  section  (a  (Z)  is  an  alternate  notation  for 
the  variance  of  a  random  variable  Z): 


Theorem  7i{  (special  version)  Let  X^  be  independent  random 

variables  with  cdfs  F^  respectively;  suppose  that  for  some 

e 

cdf  G(y),  for  which  there  is  an  e  >  0  such  that  lim  x  1  [_l-G(x)-G(-x)]]  =0, 

1  x-k° 


there  is  a  finite  constant  M  such  that  if  y^-M  then  F^N(y)^G(y)  and 
if  y^M  then  F^^(y)^.G(y)  (for  all  k,N).  Assume  that  for  a.e.  x,y 
(with  respect  to  Lebesgue  measure)  the  following  limits  exist: 

-1  N 

lim  F.t(x)=  F(x)  ,  where  F  (x)  =  N  •  E  F,  M(x) 

N-*»  N  N  k=l  kN 

-1  N 

and  lim  N  •  E  {FkN(min(x,y ) )-FkN(x)  FkN(y)}  =  K(x,y). 

N-h»  k=l 

Also  assume  that  F^--'-  is  absolutely  continuous  with  respect  to 

Lebesgue  measure  for  each  N.  Let  J  be  a  score  function  satisfying 

-1  N  , 

Assumption  J2  and  define  =  N  *  Z  J(i/(N+1))  X^^  ,  where 

"th 

X^-j  is  the  l  ordered  value  among  X^,...,  X^.  Then 

(i)  lim  No2(S  )  =  o2(J,F,K)  (given  below); 

N-*»  N 

(ii)  if  o2(J,F,K)>0,  then 

S.T  -  E  ( S  )  — >  N ( 0,1)  as  N-*°; 

N _ N 

°(v 


20 


(iii)  N2  (E(S  )  -  p(J,F  ))  -*  0  as  N-*»  where 


N 


N' 


—1 


p  ( J ,  F  )  =  /!j(u)  F  (u)  du  and 
No  N 

o2(J,F,K)  =/°°  /°°J(F(x) )  J(F(y) )  K(x,y)  dx  dy.§ 


We  now  consider  the  application  of  this  result.  Let  the  value 
of  3  be  fixed.  In  order  to  use  Theorem  7*  we  define  the  function 


(2.4) 


G(y)  = 


f  F(y-BB  ) 
1  c 

F(y+BB  ) 


y$.BB 

c 

y<-BB 


arbitrarily  in  Q-BB  ,BB  )  so  that 

C  G 


G(y)  is  a  distribtuion  function. 


Then 


lim  x  1  [l-G(x)+G(-x>]  = 
x-*» 

lim  {(x-BB  )£l  [l-2F(x-BB  )] }  •  lim 


x-*» 


x-h» 


X 


x-BB 


0*1  =0  by  Assumption  F3,  where  is  as  in  that 

assumption. 

3 

Defining  the  resuidual  e  =  Y.-gc.  and  letting  F.„  be  the  cdf 

iii  iN 

£  N 

of  e.  ,  we  note  that  the  cdfs  {F^}r_^  and  G  satisfy  the  required 

relationship  in  the  assumptions  of  Theorem  7*,  with  the  M  of  Theorem  7* 

being  BB  .  Also,  for  all  x, 
c 


(2.5) 


_1  N 

F  (x)  =  N  X  •  Z  FkN(x)=  /F(x+gc)  dH  (c)  +F  (x) 
P>^  k“l  N  P 


as  N-x» ,  where  F.(x)  e  /F(x+gc)  dH(c),  by  Assumption  Cl;  similarly, 
3 
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for  all  x  and  y. 


(2.6) 


-1  N 

N  ’  1  tFkN(min(x’y))  -  FkN(x)FkN(y)^  K6(x,y) 

k“*  1 


as  N-^o,  where  Kg(x,y)  =  /  [F(min(x,y)  +  8c)  -  F(x+3c)F(y+3c)]  dH(c). 

Since  F(x)  is  everywhere  continuous  and  strictly  increasing 

(for  xs  such  that  F(x)e(0,l))  by  Assumptions  FI  and  F2,  for  each  fixed 

6,  Fg  N(x)  and  Fg(x)  are  also  everywhere  continuous  and  strictly 

-  -1  -  -1 

increasing,  implying  F  and  FQ  are  absolutely  continuous  with 

p  ,JN  p 


respect  to  Lebesgue  measure.  Thus  J  is  uniformly  continuous  a.e.  F 

--1 

and  satisfies  the  Holder  condition  a.e.  F^  ^  for  all  N=l,2,...  by 
Assumption  J2.  Thus  Theorem  7*  is  applicable.  Letting 


S-l 

3 


(2.7) 


-1  N  B 

S  =  N  •  Z  J(i/ (N+l) )  e / . v  , 
N  V1' 


Theorem  7*  implies  (with  a2(W)  denoting  the  variance  of  the  r.v.  W) 


(2.8) 


No^(Sn)  — >  cr2( J ,Fg,K)  =  /“Z00  [j(F3(x))J(F3(y))Kg(x,y)]dxdy 


For  simplicity  we  usually  denote  a2(J,F^,K)  by  a2  to  emphasize  its 

dependence  on  g.  We  can  assume  o2> 0,  since  otherwise  equation  (2.11) 

3 

below  follows  immediately.  Then  by  Theorem  7*  we  can  conclude,  since 


V8) 


UN  ,  that 
N 
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(2.9) 


(i) 

(  V8> 

-  E 

N 

{ 

a 

{ 

N 

i 

(ii) 

Je 

1  VB)\ 

l 

N 

'  VB) 


-2*  N(0,1)  and 


y(  j,fb,n-)I 


0  as 


— 1 


N-x»,  where  y(J,Fg  N)  =  f1  J(u)  F^"^  (u)du  and  E  denotes  expectation. 

We  usually  denote  y(J,Ffi  M)  by  y  (8)  and  y(J,F  )  by  y(8).  Combining 

p }  IN  N  P 

the  results  of  (2.8)  and  (2.9): 


(2.10) 

V8)  ■  V8> 

• 

< 

N 

_ 

4 

N(0,1)  . 


By  a  standard  result  of  Mann-Wald  theory  (cf.  [24]  for  the  result  and 

notation),  if  a  sequence  of  random  variables  X  converges  in  distribution 

n 

to  a  random  variable  X,  then  X  =  0  (1).  Thus  by  (2.10) 

n  p 


(  D  (g)  -  y  (8)  N 

N  P  MN  P 


N 


0  (1)  , 
P 


e 


which  implies  D  (8)  =  Ny„(8)  +  0  (i/N)  and  D  (0)  =  Ny„(0)  +  0  (v^N) 
N  N  p  N  N  p 

Combining  we  get 


(2.11) 


D  (8)  =  D  (0)  +  N(y  (8)  -  yM(0))  +  0  ( /N)  for  each  8. 
JM  N  W  JN  p 
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To  proceed  farther  in  proving  consistency,  we  need  the  behavior  of 

u.T(g)  -  u  (0).  Specifically  we  show  that  for  any  fixed  g=fo  there  is 
N  N 

D>0  such  that  y  (g)  -  u^(0)>n  for  all  N  sufficiently  large.  We  do 
this  in  several  steps. 

Lemma  2.1  y(g)  =  y(-g)  .§ 

,  --1 

Proof  y(g)  =  f1  J(u)  F  (u)  du 

\  r-1  -1  , 

=  /  J(h+u.)  [Fg  (h+u)  -  Fg  (5g-u)]du 

since  J(3g-u)  =  -J(3g+u)  by  Assumption  Jl.  Hence  to  show  y(g)  =  y(-g) 

it  suffices  to  prove 

--1  _-l  _-l  --1 

(2.12)  Fg  (ig+u)  -  Fg  (Jg-u)  =  Fg  (ig+u)  -  Fg  (Jg-u)  for  all  ue[o,%). 


As  before  F  is  strictly  increasing,  implying  F  has  a  unique  inverse. 

_-l3  _  3 

Let  w=  F_  (^g+u),  so  F0(w)  =  3g+u.  Then 
p  p 

B 

5g+u  =  /  c  F(w  +  gc)  dH(c) 

-Bc 

Bc 

=  f  [l  -  F(-w-gc)]  dH(c)  (since  f  is  symmetric) 
-B 

c 

B 

=  1  -  /  c  F(-w-gc)  dH(c)  ;  thus 

-B 

c 

B 

h-xi  =  /  c  F(-w-gc)  dH(c) 

-B 


=  F  (-w)  ,  implying 

-p 
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_-l 

Fg  (h-u)  =  -w. 


so 


_-l  _-l 

(2.13)  Fg  (h+u)  =  -F  (Sj-u)  . 

_-l  _-l 

This  argument  also  shows  (h-u)  =  -F^  (h+u) ,  which  combined  with 
(2.13)  yields  (2.12).  Thus  y(g)  =  y(-g).  § 


Because  of  this  result  we  will  now  restrict  our  attention  to 
g>0.  Indeed,  throughout  the  rest  of  this  chapter  we  will  restrict 
attention  to  g>0,  since  the  results  for  g<0  follow  in  an  analogous 
fashion. 

We  define : 


(2.14) 


--1  _-l  -1 

C(u)  =  Fg  (h-u)  -  F^  (Jg+u)  +  2F  (h+u) . 


Then  we  have 

h 

Lemma  2.2  y(g)-y(0)=  -  /  J(u+h)  C(u)  du.  § 

-  0 

B 

Proof  Note  that  F  (u)  =  /  c  F(u)  dH(c)  =  F(u),  so 

-  o  _B 

_-l  C  -1  -1 

F  (u)  =  F  (u).  Thus  y(g)  -  y(0)  =  /JJ(u)  [F  (u)  -  F  (u)]du; 
0  0  B 

the  result  easily  follows  on  recalling  J(2g-u)  =  -J(Jgtu).  § 


Lemma  2 . 3  If  g^o  then  y(g)>y(0).  § 

Proof  By  Assumption  J1  J(?g+u)^0  for  ue  [o with  strict  inequality 
on  some  interval;  so  to  prove  the  result  it  suffices  to  show  C(0)  =  0 
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_-l  _-l  _1 

and  C(u)<0  for  ue(0,^l  .  Now  C(0)  =  F  (h)  -  F  (3g)  +  2F  (5g)  =  0 

p  p 

by  the  symmetry  of  f.  Let  ue(Q,h\  and  set  d  =  F  (^+u)>0-  Suppose 
x  is  defined  by  F  (x)  =  F(d),  so  x  =  F  (h+u) .  Then 

S  3 

_-l  _-l  -1 

C(u)  =  F  (5g-u)  -  F  (lg+u)  +  2F  (^+u) 
p  P 

_-l 

=  F  (^g-u)  -  x  +  2d. 

p 

If  we  denote  y=  F  (^-u)  and  y  =  x-2d,  to  show  C(u)<0  it  will  suffice 
p  0 


to  show  y<yQ,  or  equivalently 


(2.15) 


F  (yQ)  =  F^(x-2d)  >  h~u  , 


since  F^  is  strictly  increasing.  To  continue  the  proof  we  need  the 
following 

Fact:  If  z*-d,  then  F(-d)  -  F(z)  <  F(d)  -  F(2d+z). 


Proof  of  Fact :  Suppose  z<-d.  Then  (considering  Figure  1) 

-d 

F(-d)  -  F(z)  =  /  f(u)  du 
z 


-z 

=  /  f(u)  du 
d 

(by  symmetry),  so 

(2.16)  F(-d)  -  F(z) 


f(u) 


-z-d 

/  f(v+d)  dv. 
0 
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Since  f  is  unimodal  and  d>0,  f (d+v)<f (d-v)  for  all  v>0,  implying 


-z-d  -z-d 

/  f(v+d)  d v  <  /  f(d-v)  dv 
0  0 

d 

=  /  f(u)  du 
2d+z 

=  F(d)  -  F(2d+z)  . 

The  same  argument  shows  that  if  z>-d,  then 

F(z)  -  F(-d)  >  F(2d+z)  -  F(d),  or  equivalently 
F(-d)  -  F(z)  <  F(d)  -  F(2d+z)  , 
completing  the  proof  of  the  Fact . 

B 

Now  F  (y  )  =  /  °  F(y  tgc)  dH(c) 

p  0  _T3  U 


yn+BB 

=  /  C  F(z)  dH((z-y  )/$) 

VSBc 

By  the  Fact,  F(z)  >  [l-2F(d)]  +  F(2d+z),  so 


Yn+BB 

F  (y n)  >f  [l-2F(d)j  dH(  (z-y  )/B) 

B  y„-gB  u 

J0  c 

y0+BBc 

+  /  F(2d+z)  dH((z-y  )/3) 

y  -0B  0 

70  c 

The  first  integral  is  just  l-2F(d). 


The  second  integral  =  f  °  F(2d+y  +0c)  dH(c) 

~Bc 

B 

=  /  C  F(x+Bc)  dH(c) 


=  F(d) . 
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Thus  F  (y  )  >  l-2F(d)  +  F(d)  =  ^-u,  implying  (2.15);  thus  Lemma 
3  0 

2.3  is  proved.  § 


Lemma  2 . 4  Let  3^0;  then  there  exists  p(3)>0  and  N*  such  that 
y^B)  -  uN(0)  >  n(3)  for  all  N^N*.  § 


Proof  We  first  note  that  y  (0)  =  y(0).  Also,  by  Lemma  2.3 

y(3)  -  y(0)>0.  Hence  it  will  suffice  to  show 

lim  y  (g)  =  y(g),  i.e.  that 

N 


(2.17)  /^(u)  F  „(u)  du  - >  Z1  J(u)  F.  (u)  du  as  N-*°°. 

0  P»N  0  P 

Since  J(u)  =  0  for  u^a  and  ujl-o,  the  dominated  convergence  theorem 
will  imply  (2.17)  if  we  show: 


_-l  _-l 

(i)  N(u)  -*■  F^  (u)  for  ue[o,l-a],  and 


(ii)  there  is  v(u)  integrable  on  [a,l-ct]  such  that 


-1 


|Fp  ^(u)|^v(u)  for  ue[a,l-aj  for  all  N  large. 


Proof  of  (i):  By  assumption  Cl,  F0  ,T(x)  -»•  F  (x)  for  all  x. 

p,N  p 

To  simplify  notation  let  g  (x)  =  F.  ,T(x)  and  g(x)  =  FD(x). 

N  P,N  p 

Then  we  know  g^(x)  — *  g(x)  for  all  x,  and  we  wish  to  prove 
g^  (u)  ->  g  (u)  for  all  ue[a,l-a].  Let  e>0  be  given  and 
pick  ue [0,1-0];  denote  x=g  (u).  We  must  show  | g^  (u)-x|<e 
for  all  N  large.  By  the  monotonicity  of  g^  it  will  suffice 
to  show  (for  N  large): 
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(2.18) 


g  (x-e)<u 
N 

g  (x+e )>u 
N 


But  g^Cx-e)  g(x-e)<u,  and  g^Cx+e)  g(x+e)>u,  so  for  all 

N  large  (2.18)  obtains,  implying  (i). 

—  1  —1  —1 

Proof  of  (ii):  For  ue[a,l-a],  |g  (u)  |<max{ | g^  (l-a)|,  | (a)|), 

-1  -1  -1  -1 
But  by  (i)  gN  (1-a)  ->  g  (1-a)  and  gN  (a)  ->  g  (a)  as 

”1  “1 

N-*».  Therefore  set  v(u)  =  max{  |  g  ( 1— ct )|  +1,  |g  (a)|+l). 

..-I 

Then  for  all  N  sufficiently  large  |F  (u)|^v(u).  Trivially 

3  5N 

v  is  integrable  on  [a, 1-a],  implying  (ii). 

Thus  (2.17)  obtains,  completing  the  proof  of  Lemma  2.4.  § 


Lemma  2.5  Whenever  it  exists,  the  derivative  of  the  dispersion 


has  the  bound 


D'(3)|  <  NB  B  .  § 

N  J  c 


Proof  By  Jaeckel's  Theorem  1  and  the  remarks  of  p.  1451  of  [l8]  , 
D^(8)  is  a  non-negative,  continuous,  piecewise  linear  function  of 
8  (even  in  the  case  J(u)  is  not  increasing  for  ue[%,l]]).  By  p.  1455 
of  [l8]  ,  where  it  exists 


D'  (8)  =  -  Z  a  (k)  c 
N  k=l  N 


N  (k) 

Z  J(k/(N+1) )  c 
k=l 


(  k  )  ^ 

where  c  is  the  c  value  associated  with  the  residual  e.,  . .  Hence 

(k) 

N  (k) 

|D'  (8) |  <  £  |j(k/(N+l))|  i c  [  <  NB  B  by 

w  k=l  J  c 


Assumptions  Cl  and  J2.  § 
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Theorem  2 . 1  (Consistency  of  Jaeckel's  estimator) 

Assume  0^=0  without  loss  of  generality.  Then  under  the  assumptions 
of  Section  2.1, 

P 

3..  +  0.  § 

N 

Proof  The  idea  of  the  proof  is  the  following  simple  observation. 

For  a left,  if  D  (0)  (to)  >  DM(0)  (to),  then  0.T(u))  ^  3  since  01T  minimizes 
N  N  N  N 

D  (0)  over  all  0eB^.  Let  A,e>0  be  arbitrary  and  choose  0*  outside 
N 

the  interval  0^=(-A,A).  Then  by  the  piecewise  linearity  of  and 

Lemmas  2.4  and  2.5,  there  is  an  interval  Ig*=(3*-h^A,  0*+h^A)  and  N  A 

such  that  for  all  N»N „ . 

05{ 


(2.19) 


sup  |D  (3)-D  (0*)|  <  ^N-  [y  (0*)-y(O)] 

Belg*  N 


(simply  choose  hoA  <  n(0*)/(2B  B  )  ).  Consider  the  collection  of  open 
3"  J  c 

sets  (Ig*:  0*eB"O^}.  This  collection  provides  an  open  cover  of 
B^-0^,  which  is  compact  by  Assumption  Bl.  Thus  there  is  a  finite 


subcover  I 


3i: 


,  I  say.  Recall  that  by  (2.11), 

3P 

DN(Bi}  =  V0)  +  N(yN(3i)-li(0))  +  0  (/n) 


(where  we  note  that  the  term  0  (/N)  may  depend  on  0  ) .  Thus  for  each 

P  i 

0  (i=l,2, . . . ,p) ,  there  exists  N.  (for  which  (2.19)  —  with  0*=0. — 

ill 

obtains  for  N>N . )  such  that 


P{  DJ0J  -  D„(0)  -  N(yN(0i)-y(O))  ^  (y„(  0  .  )  -y  (  0 )  )  }  <  e 


N  l 


N 


N  i 


max  hi  then  for  all  N^N* 
i— 1 ,  •  •  •  ,p 

the  following  p  inequalities  hold  simultaneously  with  probability  >  1-e: 


for  all  N>N . .  If  we  let  N*  = 

l 
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(2.20)  |DN(f3.)  -  Dn(0)  -  N(yN(g.)-y(0))|  <  E  (pN(0.)  -  y(0)) 


i  =  1,2,. . . ,p. 


Now  any  0  e  BU  —  0A  satisfies  0elg  for  some  k  =  l,2,...,p;  also 

k 

for  all  N»N* 


N 


DN(f5)  "  DN(f3k}  <  2  (Vek}  -  ^(0)) 


by  (2.19).  Hence  by  (2.20)  we  obtain 


P{Scb0-OsDn<8>  >  D»<0)1  *  1  '  ' 

for  all  N  >  N  .  Thus  P{  0^  i  (-A,A)}  <  e  for  all  N  £  N*,  implying 
~  P 

0^  — >0  since  e  and  A  were  arbitrary.  § 


2.3  Counterexample 

The  aim  of  this  section  is  to  show  that,  in  the  case  of  a 
non-monotone  score  function  J(u),  some  sort  of  condition  (not  a 
regularity  condition)  needs  to  be  invoked  on  the  distribution  of  the 
errors  in  order  that  Jaeckel's  estimator  have  the  desired  asymptotic 
properties.  In  the  last  section  we  proved  the  consistency  of  Jaeckel's 
estimator  assuming  f  is  unimodal;  the  counterexample  that  follows 
shows  that  this  assumption  cannot  be  dropped  without  invoking  some 
other  conditions.  In  the  case  of  non-monotone  scores,  there  are  non- 
unimodal  densities  (otherwise  well-behaved)  for  which  Jaeckel's  estimator 
is  not  consistent  for  the  true  slope  parameter. 
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Theorem  2 . 2 


There  are  a  non-monotone  score  function  J(u)  and  a  non-unimodal 
density  f,  with  J  satisfying  the  assumptions  of  Section  2.1  and 
f  satisfying  all  but  the  unimodality  assumption  of  Section  2.1,  for 
which  0^  2.  § 

To  carry  out  the  proof  we  need 


Lemma  2 . 6 

There  exist  J  and  f  as  described  in  Theorem  2 . 2  and  3 1  /  0 
such  that  y(0')<y(O).  § 

h 

Proof  Recall  y(0)  -  y(0)  =  -/  J(u.+h)  C(u)  du.  Also  C(0)  =  0. 

-  0 

We  will  find  0'  and  s>0  such  that  C(u)  >  0  for  u  e  (0,e).  This 
will  be  sufficient  to  prove  the  lemma  since  we  will  then  choose 
(cf.  Figure  2)  J  such  that  J(n+h)  =  0  for  u  >  e  and  J(u+^)  >  0 
for  u  e  (0,e/2)  say . 

We'll  assume  H  has 
a  density  h,  symmetric 

about  0.  Since  FIGURE  2 


J(u) 


{  /Bcf(F-1(3g-u)  +  0c)  h(c)  dc}  1 
-Bc  3 

{  /Bc  f(F_1(5g+u)  +  0c)  h(c)  dc}_1 
-Bc  3 


32 


Therefore 


C'(0)  =  2 


f(0) 


/Bcf(F"1(^)  +  gc)  h(c)  dc 
-Bc  S 


If  h  is  symmetric  about  0  it  is  easy  to  show  that  F  1  (Jg)  =  0 

3 


(2.21) 


C'(0)  =  2 


f(0) 


/Bc  f(gc)  h(c)  dc 


Now  Bc  is  fixed.  If  f(x)  >  f(0)  in  an  open  neighborhood  of  0 
(with  0  deleted),  and  f(0)  >  0,  then  by  taking  g  sufficiently 
close  to  0  (3  i  0)  we  obtain 


B 

f  c  f(3c)  h(c)dc  >  f ( 0 ) , 

"Bc 

implying  C'(0)  >  0.  Since  C  has  a  finite  first  derivative  at 
t  =  0,  a  theorem  from  calculus  (cf .  Chung  Q  8  ,  p.  156)  implies 

(2.22)  C(t)  =  C(0)  +  C'(0)*t  +  o(|-t|)  as  t  +  0 

=  tC'(O)  +  o( ( t | ) 


Thus  there  is  e  >  0  such  that  0  <  t  <  e  implies  C(t)  >  0.  § 

Proof  of  Theorem  2 . 2  We  consider  the  same  J  and  f  functions 

described  in  the  proof  of  Lemma  2.6.  Choose  a  point  3'  (whose 
existence  is  guaranteed  by  Lemma  2.6)  such  that 

y(0)  -  y(S’)  =  A  >  0 

say.  By  the  proof  of  Lemma  2.4,  y^(g')-»-y(g,),so  there  is  N’ 

such  that  for  all  N  5-  N '  , 


,  so 
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v(o)  -  yN(3')  >  A/2 


This  fact,  together  with 


Dn(3')  =  Dn(0)  +  N(yN(B^)  -  y( 0) )  +  0p(v¥), 

imply  that  given  e  >  0  there  is  N"  such  that 


(2.23)  P{  Dn(0)  -  Dn(3')  ^  NA/4}  »  1  -  e  for  all  N  £  N". 


We  consider  the  neighborhood  about  0  defined  by 


0o  =  {3= 


6 


< 


A 

8BtB 
J  c 


} 


By  Lemma  2 . 5 


(2.24)  inf  D..( B)  ^  D.T(0)  -  NA/8  w.p.l. 

3  e  0o  N  N 

Thus  by  (2.23) 

P{  inf  Dn(3)  >>  Dn(3')+  NA/8}  ^  1  -  e 
3  e  0o 

~  P 

for  all  N  N".  Hence  3^  -/  >  0 .  § 


2 . 4  Comments  on  paper  of  Stigler 

In  the  next  several  sections  we  consider  [29]  :  "Linear 
functions  of  order  statistics  with  smooth  weight  functions,"  by 
Stephen  Stigler.  Our  interest  in  this  paper  derives  from  the  fact 
that  many  of  the  results  in  it  are  used  extensively  in  Sections  2.2 
and  2.3  in  considering  the  consistency  of  Jaeckel's  estimator  3^- 
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However  for  our  purposes  there  are  several  deficiencies  in  the  paper : 
Theorems  6  and  7  are  incorrect  (cf.  Section  2.5  for  a  counterexample); 
in  our  attempt  to  patch  up  Theorem  7 ,  it  was  also  discovered  that  there 
is  a  mistake  in  Stigler's  proof  of  Theorem  4,  on  which  his  later 
results  depend;  lastly  there  are  gaps  in  several  of  his  proofs  which 
possibly  deserve  some  elucidation. 

In  Sections  2. 4-2. 7  we  state  and  prove  a  corrected  version  of 
Stigler's  Theorem  7,  and  in  the  course  of  the  proof  we  indicate  one 
way  of  getting  around  the  difficulty  involved  in  his  Theorem  4  (at 
the  expense  of  an  extra  assumption,  however);  in  these  sections  we 
also  prove  some  results  which  help  to  fill  in  the  gaps.  In  the 
remainder  of  this  section,  we  outline  the  problem  which  Stigler's 
paper  addresses,  his  notation,  and  outline  his  most  important  results 
and  their  interconnections,  in  order  to  illuminate  our  later  proofs 
(which  lean  heavily  on  Stigler's  proofs). 

Let  ^tn’^2n’--'’  ^nn  independent  random  variables  with 
(possibly  different)  cdfs  F^n,F2n, . . . ,  Fnn.  If  we  denote  the  order 
statistics  of  this  sample  as  X(i)-SX(2)^’  •  *<^(n) »  then  Stigler  is 
interested  in  the  asymptotic  behavior  of  the  statistic 

-1  n 

(2.25)  S  =  n  Z  J(i/(n+l) )  X,.. 

n  i=i  (l) 

where  J:  [o,lJ  -*■  R  is  some  weight  function.  By  asymptotic  behavior 
we  mean  the  asymptotic  normality  of  (a  normalized  version  of)  S 
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and  the  speed  of  convergence  of  E(S  )  to  an  asymptotic  value.  The 
paper  is  concerned  with  two  set-ups:  the  Xs  are  iid,  so  F^n  =  F  for  all 
k  and  n,  or  the  more  general  non-iid  case.  In  both  set-ups  Stigler's 
results  deal  with  the  interplay  between  assumptions  about  the  cdfs 
(the  existence  or  lack  thereof  of  second  moments)  and  assumptions 
about  the  weight  function  J  (whether  it  "trims"  or  not). 

Theorem  2  is  the  basic  asymptotic  normality  result  for  S 

assuming  the  iid  case  and  that  F  has  a  finite  second  moment.  Theorem  4 

is  the  basic  result  containing  information  on  how  fast  E(Sn)  converges 

to  y(J,F)  =  /1J(u)  F  (u)  du  ,  also  assuming  the  iid  case  and  the 

0 

existence  of  the  second  moment.  The  proof  of  normality  basically 
involves  an  application  of  Hajek's  projection  lemma  to  show  Sn  is 
"equivalent"  to  a  sequence  of  random  variables  for  which  the  standard 
central  limit  theorem  applies.  On  the  other  hand  the  proof  of  Theorem  4 
basically  involves  an  application  of  dominated  convergence  —  unfortu¬ 
nately  the  purported  dominating  function  does  not  dominate.  Theorem  5 
then  combines  the  conclusions  of  Theorems  2  and  4,  only  the  assumption 
of  a  second  moment  is  replaced  by  a  much  weaker  tail  condition  while 
the  extra  assumption  that  J  trims  (i.e.  J.(u)  =  0  for  u^a  and  u>l-a) 
is  added. 

Stigler  indicates  in  the  proof  of  Theorem  5  how  in  general  this 
new  assumption  can  be  used  at  the  places  where  dominated  convergence 
was  invoked  in  the  proofs  of  Theorems  2  and  4  to  replace  the  second 
moment  assumption  —  it  involves  using  the  new  assumption  to  bound 
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certain  binomial  tail  probabilities.  Lastly  Theorem  6  is  the  extension 
of  Theorems  2  and  4  to  the  non-iid  case,  and  Theorem  7  extends  Theorem  5 
to  the  non-iid  case.  The  proofs  on  the  whole  are  pretty  similar  to 
those  in  the  iid  case,  only  now  Lindeberg-Feller  replaces  the  regular 
central  limit  theorem,  and  certain  random  variables,  useful  in  bounding 
different  quantities,  which  were  binomial  are  now  generalized  binomial 
random  variables.  Theorems  6  and  7  both  contain  the  rate  of  convergence 
result  for  E(S  )  which  is  false.  The  proof  to  Theorem  6  outlines 
the  changes  to  the  proof  of  Theorem  2  necessary  to  prove  asymptotic 
normality  in  the  non-iid  case,  but  no  real  proof  is  given  to  the 
(false)  extension  of  Theorem  4  —  just  that  it  follows  "in  an  equally 
straightforward  manner."  No  proof  to  Theorem  7  is  given  —  just  that 
it  follows  from  Theorem  6  as  Theorem  5  followed  from  Theorems  2  and  4. 

Our  proof  of  a  corrected  version  of  Theorem  7  will  be  very  similar 
to  Stigler's  development.  We  will  begin  by  proving  the  corrected 
version  of  Theorem  6:  this  will  be  obtained  by  mimicking  the  proof 
Stigler  gives  (pp.  684-686)  for  Theorem  4  —  only  assuming  the  non- 
iid  case  —  and  by  using  the  comments  in  Stigler's  proof  of  Theorem  6, 
and  by  circumventing  the  incorrect  dominating  argument  by  assuming 
that  J  trims  (in  addition  to  the  assumption  of  second  moments).  Then 
the  step  from  this  corrected  Theorem  6  to  the  corrected  Theorem  7 
will  invoke  the  method  described  in  Stigler's  proof  of  Theorem  5; 
in  Section  2.7  we  show  how  to  obtain  certain  inequalities  for  the 
generalized  binomial  distribution  necessary  for  this  step. 
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2.5  Counterexamples  to  Stigler 


In  his  Theorem  6  Stigler  asserts  that  n2  (E(S  )  -  p(J,F))  a-  0 

n 

as  n-*»,  where 

-1  n 

lim  n  Z  F  (x)  =  F(x). 
n-H»  k=l  kn 

The  following  example,  which  satisfies  all  of  the  regularity  conditions 
of  Theorem  6,  shows  that  in  general  this  result  is  not  true. 

Counterexample .  Let  be  iid  U(0,1)  random  variables,  and 

define  the  triangular  array 

(2.26)  Zkn  =  Xk  +  n"^  k=l,2,...,n. 

Let  F  denote  the  cdf  of  the  Xs  and  F,  that  of  Z.  .  To  satisfy  the 

kn  kn  J 

conditions  of  the  theorem  we  need  also  a  cdf  G  with  finite  second 

moment  and  some  finite  constant  M  such  that  Fk^(y)<G(y)  if  y<$-M  and 

F,  (y)^G(y)  if  y^M.  For  the  cdf  G  in  this  case  take  the  cdf  of  a 
Kn 

U(-3 ,3 )  random  variable,  and  take  M=2.  Then  {F^  }  and  G  satisfy  the 
requirements.  Furthermore  it  is  clear  that 

-1  n 

(2.27)  lim  n  £  F  (x)  =  F(x), 

n-x»  k=l 

so  the  limit  on  the  left  hand  side  exists  as  is  required  by  the  theorem. 
In  the  terminology  of  the  theorem  then,  F(x)  =  F(x).  Also  it  is 
clear  that 

-1  n 

lim  n  Z  [Fkn(min(x,y)  }  -  Fkn(x)  Fkn(y)l 
n-x»  k=l 

exists,  thus  satisfying  the  requirements  of  the  theorem.  We  define 
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(2.28) 


-1 


n 


S  =  n  E  J(i/(n+l) )  X, . , 
n  ._h  (i) 


-1 


i=l 

n 


S*  =  n  E  J(i/(n+l) )  Z, M 
n  i=1  (1) 

th  n 

where  is  the  i  ordered  value  among  {Z^  5  an<3  J(u)  is  anY 

nice  function,  specifically  take  J(u)  =  1.  Clearly  J  satisfies  all 
of  the  regularity  conditions .  Then 


(2.29) 


S*  =  S  +  n 
n  n 


-k 


According  to  the  last  remarks  of  Theorem  6 

(2.30)  (i)  [E(S  )  -  p(F)l  -*•  0 

n  J 

(ii)  n^  [E(S*)  -  y(F)]  ->  0 
u  n  J 

as  n^<»,  where 

y(F)  =  /1J( u)  F  1(u)  du. 

0 

j,  x. 

However  using  (2.29)  in  (ii)  we  have  n2  [E(Sn)  +n4  -  y(F)j  ->0 
as  n-^3,  which  contradicts  (i).  § 

As  we  will  show  in  the  next  section,  a  simple  and  natural 
modification  of  the  offending  statement  makes  the  rate  result  obtain 
in  Theorem  6  (and  also  in  Theorem  7).  In  contrast,  the  error  associated 
with  Theorem  4  relates  to  the  proof  of  the  result  and  not  its  statement. 
It  may  very  well  be  that  the  result  stated  in  Stigler's  Theorem  4 
is  correct.  However  we  were  not  able  to  patch  up  the  proof  without 
introducting  an  additional  assumption. 

The  error  in  the  proof  of  Theorem  4  is  located  in  the  first 
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sentence  of  the  last  paragraph  of  p.  685  of  [29] :  namely  the  statement 
that  the  supremum  of  H*  (u;x)  occurs  at  u=F(x)  is  not  in  general 
true.  For  suppose  x  is  chosen  such  that  F(x)  =  j+n  for  some 

n+1 

nonnegative  integer  j  and  0<n<l  (note  the  strict  inequalities). 

Obviously  this  is  the  case  for  a.e.  x  since  F  has  a  positive  density. 

We  now  compute  H*  (F(x)  :x)  and  H*  (F(x);x).  (To  shorten  notation 
n  n 

we  will  usually  just  write  H*  (u)  for  H*  (u;x).)  For  u<F(x), 


(2.31) 


H*  (u;x)  =  H  (u;x)  =  n 
n  n 


i-$(n+l)u 


P(X, ..>x) 
(1) 


For  convenience  we  denote  P(X^^>x)  =  p^  (suppressing  the  dependence 
on  n  and  x) .  We  consider  ue[j/(n+l),  F(x))  (in  this  interval  it 
turns  out  H*  is  constant).  For  such  u,  [(n+l)u]  =  j,  where  the 
brackets  denote  the  greatest  integer  function.  Thus 

,  ,  ,  -Jg  3 

H*  (u)  =  n  2  Z  p.  , 


i=l 


and  hence 


(2.32) 

Now  we  consider  u=F(x).  Note  that 


H*  (F(x)  )  =  lim  H*  (u)  =  n  2  Z  p. 

n  utF(x)  n  i=l  1 


(2.33) 


H*  ( F( x) )  =  H  (F(x) )  -  a  (F(x)) 
n  n  n 


But  H  ( F(  x) )  =  n2  (1  -  F(x))  -  n  2  *  Z 

n  i>(n+l)F(x)  1 


p .  .  By  calculations 
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done  after  equation  (12)  of  p.  684 


(2.34) 

implying 


so 


n 

Z  p.  =  n  -  nF(x) 
i=l  1 


n  j 

Z  p.  =  Z  p.  =  n  -  nF(x)  -  Z  p. 

i>(n+l)F(x)  1  i= j+1  1  i=l  1» 


(2.35)  H  (F(x) )  =  H*  (F(x)  )  . 

n  n 

But  a^(F(x))  =  n  2  jjn+l)F(x)]  -  naF(x)  =  n  2j  -  n  ^  ( ( j+n)/(n+l) ) , 
so 


(2.36)  a  (F(x) )  =  j  -  nn 

n  — - 

/ n  (n+1) 

Combining  (2.33),  (2.35),  and  (2.36)  we  obtain 


(2.37) 


H*  (F(x) )  =  H*  (F(x)~)  - 
n  n 


j  -  nn 
,  /n  (n+1) 


If  n<j/n,  then  j-nn>0,  implying  H*  (F(x))  <  H*  (F(x)  )  ,  in  turn 
implying  that  sup  H*  (u;x)  does  not  necessarily  occur  at  u=F(x). 


u 

Indeed  for  any  x  such  that 


n{(n+l)F(x)  -  ‘[(n+l)F(x)J }  <  [(n+l)F(x)],  the 
supremum  will  not  be  at  u=F(x).  However,  from  the  fact  that  H*  is 
non-decreasing  for  u<F(x)  and  non- increasing  for  u>F(x)  and  the  above 
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calculations ,  we  can  say 


(2.38)  sup  H"  (u;x)  H-  (F(x);x)  +  a  (F(x))  . 

n  n  '  n 

u 

The  main  difficulty  that  the  correct  statement  (2.38)  causes  in  the 
proof  is  not  in  showing  that 

|/1{J(u)  -  J(F(x) )}  dH*  (u;x)|  +  0, 

0  n 

but  instead  in  bounding  this  sequence  by  an  integrable  function  so  that 
dominated  convergence  can  be  invoked. 


2.6  Proofs  of  corrected  results 


We  will  now  prove  a  modified  version  of  Theorem  6: 


Theorem  6*  Suppose  that  for  some  distribution  function  G(y)  of  a  ran- 

2 

dom  variable  Y  with  E(Y  )«*>,  it  is  true  that  whenever  y^-M, 


F^n(y)^G(y),  and  whenever  y^M,  F^n(y)^G(y) ,  where  M  is  some  finite 
constant.  Assume that  for  a.e.  x,y  with  respect  to  Lebesgue  measure 
the  following  limits  exist: 


lim  F  (x)  =  F(x),  where  F  (x)  =  n  •  £  F,  (x)  ; 

n  n  ,  .  kn 

n-*»  k=l 


n 


and 


-1  n 

lim  n  •  Z  (Fkn(min(x,y) )  -  F  (x)  Fkn(y)}  =  K(x,y)  . 

n-x»  k=l  J  J‘ 

Then  if  J(u)  is  bounded  and  continuous  a.e.  F  , 


na2(Sn)  a-  o2(J,F,K)  (given  below)  as  n-*»; 
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and  if  o2( J,F,K)>0,  then 

c  _  fCq  )  ^ 

,n _ n  -*  N(0,1)  as  n-*°  . 

a(S  ) 
n 

Here 

cf2(J,F,K)  =  /  _/  J(F(x))  J(F(y))  K(x,y)  dxdy  . 

If  in  addition  J  satisfies  a  Holder  condition  with  y>h  (cf.  Section 

_-l 

2.1)  (except  possibly  at  a  finite  set  of  points  t^,...,t  with  F^ 
measure  0  for  each  n)  and  J(u)=0  for  u^a  and  u£l-ct,  and  if 

/{G(y)  (l-G(y) ) dy 

is  finite,  then 

(2.39)  n2  (E(Sn)  -  y(J,Fn))  ->  0,  where 

y(J,F  )  =  Z1  J(u )  F  \u)  du.  § 

n  o  n 

Proof  The  proof  of  the  asymptotic  normality  is  just  Stigler's  proof 
on  page  689  (cf.  Section  2.7  however  for  elucidation  on  the  use  of 
Chebychev's  inequality  for  the  generalized  binomial  distribution). 

The  task  of  this  proof  is  to  show  that  (2.39)  obtains  by  appropriately 
modifying  Stigler's  proof  of  Theorem  4.  (The  following  should  be  read 
in  conjunction  with  that  proof. )  Without  loss  of  generality  we  assume 
0  is  a  median  of  F.  As  in  his  proof  integration  by  parts  yields 
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>  x)}  dx  - 


n 


(2.40)  E(S  )  =  {n-1  E  J(-i-)  P(X,M 

n  0  i=l  n+l  (i) 


/°  (n  1  ?  J(-~ )  PCX,..  x)}  dx 
i=l  n+l  (.i; 


We  will  handle  only  the  first  integral,  since  the  result  for  the 
second  follows  in  a  similar  manner.  We  define 


I  =  rh  r  (n  1  E  P(Xm  >  X)  -  _Jl  J(u)  du}  dx 

n  0  i=1  n+l  U)  Fn(x) 


=  /"  f1  J(u)  dH  (u;x)  dx 
0  0  n 


where 


Hn(u;x)  = 


1  n  ^  (1-u)  -  n  Z  >  x) 

i>(n+l)u  U; 


n-2  •  2  P(X,..  >  x) 

i^(n+l)u 


u  >  Fn(x) 


u  <  F  (x) 
n 


Define 


n 


(u)  =  n  2  [[(n+l)uj  -  n^u  (with  a  (1)  =  0), 


and 


H“(u;x)  =  H  (u;x)  -  a  (u)  Ir  —  .(u 

n  n  n  iu>Fn(x)} 


Then 


I  =  I  +  I  ,  where 

n  In  2n 


and 


I  =  /“  /l  {  J(u)  -  J(F  (x))}  dH  "  (u;x)  dx  , 

In  o  o  ” 


n 


I0  =  /"  {  f1  J(u)  da  (u)  +  J(F  (x))  a  (F  (x))}  dx 
2n  0  Fn(x)  n  n  n-  n 


We  shall  show  I^n  and  I0r>  ->  0  as  n  -*  00  by  the  dominated  convergence 


2n 


theorem. 


We  note  that  H  5:  (u)  is  monotone  non-decreasing  for  u  <  F  (x), 
n  n 

and  monotone  non-increasing  for  u  F^(x);  by  the  reasoning  leading 
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to  (2.38) 

(2.41)  sup  H  "(u)  <  H  S‘(F(x))  +  ja  (F  (x))| 

u  1 1  n  n  n  n 

We  also  note  that  now  the  X. s  are  not  iid  F,  but  instead  X,  F,  , 
k  =  l,2,...,n;  thus  P(X^^  >  x)  no  longe^  the  (lower)  tail  probabil¬ 
ity  of  a  binomial  random  variable,  but  is  the  tail  probability  of  the 
generalized  binomial  random  variable  Yn  =  #  (Xs  ^  x}  with  parameter 


p„  =  (rln(x).  . 

.  .,  F  (x)),  and  mean 

nFn(x). 

Consider  a  fixed 

x  f  F_1(t1),  . . 

.  ,  F-:1(tp).  Note  that 

for  all 

n  sufficiently  large, 

min( |Fn(x)  -  t1 

|  ,  .  . .  ,  |Fn(x)  -  tp| ) 

-1/8 

^  n 

Defining 

An  =  {u;  |  u  -  Fn(x)| 

N<  n'^8 

and  0  ^  u  <  1}  , 

it  follows  by  Chebychev's  inequality  for  fourth  powers  of  a  generalized 
binomial  random  variable  (cf.  Section  2.7  also)  and  the  boundedness 
of  J  that 

(2.42)  |  Z1  { J (u)  -  J(F  (x))}dH  *(u;x)  | 

0  n  n  1 

=  |  /  {  J(u)  -  J(Fn(x))}  dHn*(u;x)|  +  o(l) 

An 

<  sup _  | J ( u )  -  J(Fn(x))|  *|dHn“(u;x)|  +  o(l)  , 

u  e  An 

dH  is  the  total  variation  of  the  measure  H  “.  Since 
n  1  n 

_  .4 

Hn  ‘  is  monotone  increasing  and  then  monotone  decreasing, 

|  dHn"  j  ^  2* sup  Hn’'(u;x) 

Thus  the  right  hand  side  of  (2.42)  is  less  than  or  equal  to 
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2-sup_|j(u)  -  J(Fn(x))|*  sup  Hn“(u;x)  +  o(l). 

u  e  A  u 

n 

Since  the  length  of  A  goes  to  0,  sup  |j(u)  -  J(F  (x))|  0 

u  e  n  _  * 

as  n  -*■  <*>.  Thus  to  show  I^n  -*  0  we  need  only  show  that  sup  (u;x) 
is  bounded,  and  that  the  sequence  of  functions 


|  Z1  (J(u)  -  J(Fn(x))}  dHn*(u;x)i 

on  x  e  [o,«>)  is  bounded  by  an  integrable  function. 

As  in  Stigler's  proof 


(2.43)  Hn‘!(Fn(x)  ;x)  =  n  ^  E  (max(nFn(x)  -  Qn+1  )Fn(xj]  ,  nFn(x)-Yn)}  . 

Now  0  is  a  median  of  F,  but  for  x  >  0  we  don't  know  that 
F  (x)  >  *5  ■  However,  since  |nFn(x)  -  [(n+DF^Cx)]  j  2  certainly, 

(2.44)  max  (nF  (x)  -  f(n+l)F  (x)"|  ,  nF  (x)  -  Y_)  <  InF  (x)  -  Y  I  +  2. 

n  u  nJn  11  'n  n 

Then  by  Stigler's  reasoning 


H*  ( F  ( x ) )  <  VFT1(X)  (1  -  F  (x))  +  2//n  . 

n  n  n  n 

Since  it  is  easy  to  show  that  |a  (Fn(x))|  <  l//n  ,  this  result 
combined  with  (2.41)  yields  (for  all  n  sufficiently  large) 

sup  H*  (u;x)  ^  H*(F  (x))  +  | a  (F  (x) ) | 
n  n  n  1  n  n 

u 

^YfTx)  (1  -  Fn(x))  +  1 


<  2 


9 
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implying  the  boundedness  of  sup  H*  (u;x)  . 

u 

Denoting  p  (x)  =  /1{J(u)  -  J(F  (x))}  dH*  (u;x)  , 

no  n  n 

we  now  need  to  find  a  function  p(x),  integrable  on  xe[0,co)5  such  that 
[ Pn ( x )  |  ^  p(x)  for  all  n  sufficiently  large  and  all  xe[o,°°).  We 

_-l  __i 

write  [o,oo)  =  LiUL2  =  [0,F  (1-|))  U  [f  For  xe^ 

it  is  easy  to  define  p(x)  : 

Pn(x)  <  /^JCu)  -  J(Fn(x))|  | dH* | 

^  f1  2B  {2#  (x)  (1  -  F  (x))  +  2}  du 

0  J  n  n 

<  f1  2B  {2-h  +  2}  du  4  6B 
0  J  J 

Thus  we  set  p(x)  =  6B  for  xeL  . 

J  1 

—  1  QL  —  Q 

xeL^  :  Let  xq=F  (1  -  _  )  ;  then  by  assumption  F^Cx^)  ->  1  -  —  , 

.  —  a 

so  there  is  N  such  that  for  n>N.  F  (x„)  1  -  tt  .  Since 

1  1  n  u  £ 

F  is  monotone  this  implies  F  (x)  £.1  -  —  for  all  xeL  if 
n  n  2  2 

n^N^.  But  J(u)  =  0  for  u^l-a,  so  for  xeL^  and  n^N^ 

J(Fn(x))  =  0,  implying 

l-a  _ 

p  (x)  =  /  J(u)  dH*  (u;x)  for  xeL^,  n^N  . 

n  0  n  2  i 

Since  H*  is  monotone  increasing  for  u$F  (x)  ,  and  since 
n  n 

—  ,  .  a  , 

F(x)>.l-—  we  obtain 
n  2 
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| Pn(x) I  «  H*  (l-a;x) 


for  xeL^ ,  n^ 


=  B  H  (1-ajx) 
J  n 


since  u  =  1-a  <  F  (x)  and  H*  =  H  for  u<F  (x).  Thus 

n  n  n  n 


(2.45) 


Ip  ( x ) I  ^  B  n  ^  *  X  P(X,  . .>x) 

n  J  i^(n+l)(l-a)  (1) 


Any  index  i  in  this  summation  satisfies  i^(n+l)(l-a) ,  so  for  all 
n  >,  4(l-a)/a  say, 

|yv  ~  i|  —  s  where  P(X,  M>x)  =  P(Y  <i) 
Yn  i,  n 


and  E(Y  )  =  yv  =  nF  (x)  ^n(l  -  ) ,  By  Chebychev's  inequality  for 

n  in  n  2 


fourth  moments 


P{|Y  -  y  \>,e} 

n  in 


.<  MV 


n '  ,  where 


,th 


y  ^  ( Y^ )  is  the  4  central  moment  of  Y  .  Letting  e  =  —  and  using  the 
results  of  Section  2.7,  we  have  for  any  i  in  the  index  set 


P(X,.s>x)  3n2  F2  (x)  {1-F  (x)}2  +  nF  (x)  {1-F  (x)} 

u;  n  n  n  n 


(“I 


<  K  F  (x)  {1-F  (x)} 
an  n 


n 


,  where  Ka  =  6 ’4 
4 


Thus 


i  ,  -k 

p  (x)  B  n  E  P(X...>x) 

n  J  is(n+l)(l-a)  (l) 


48 


for  xeL^  and  all  n  sufficiently  large.  Thus  for  xeL^  set 


p(x)  = 


(  KB  x<M 

a  J 

L  KaB_  ( l-G(x) ) 


x>M 


Then  for  xe[~0,°°)  Ip  (x)|  <  p(x)  and  clearly  f°°  p(x)  dx  <«> 

n  0 

2  00 

since  E(Y  )«»  implies  /  (1-G(x))dx  <  ■»  (indeed  it  implies  that 

0 

Co 

/  x(l-G(x))  dx  <  ro  by  a  simple  Fubini  argument).  Thus  we  can  conclude 

0 

I  -v  0  as  n-K<>.  Stigler's  proof  that  l2n  +  0  in  Theorem  4  is  simply 
adapted  to  show  I2n  a-  0  in  the  non-iid  case  by  replacing  F(x)  by  Fn(x) 
and  noting  that  his  results  still  obtain.  The  proof  of  Theorem  6* 
is  complete.  § 


Theorem  7*  If  the  moment  conditions  on  G  of  Theorem  6*  are  replaced 

e 

by  the  condition  that  for  some  e^>0,  lim  x  1 (l-G(x)-G(-x) )  =  0,  and 

x-x» 

we  continue  to  assume  the  conditions  on  J  in  Theorem  6*,  including 
that  J(u)  =  0  for  u^a  and  u>;l~a,  then  the  conclusions  of  Theorem  6* 
continue  to  hold.  § 


As  we  have  mentioned,  the  proof  of  this  modification  of  Theorem  6* 
utilizes  exactly  the  techniques  of  Stigler’s  modification  of  his  proofs 
of  his  Theorems  2  and  4  to  obtain  his  Theorem  5:  namely  one  finds  bounds 


for  certain  (generalized)  binomial  tail  probabilities  which  allow  the 
continued  use  of  the  dominated  convergence  theorem  under  the  modified 
assumptions.  We  will  not  prove  the  result  in  detail.  Section  2.7, 
however,  proves  that  certain  moments  of  the  generalized  binomial 
distribution  behave  nicely,  so  that  bounds  for  the  needed  tail  prob¬ 
abilities  in  the  non-iid  case  are  analogous  to  those  Stigler  uses  in 
proving  Theorem  5  in  the  iid  case. 

2.7  Miscellaneous  results 

Throughout  this  section  let  Y  be  a  generalized  binomial  random 

T 

variable  with  parameters  n  and  p^  =  (P^ 5P2 » • • • »Pn) »  i-e. 

Y  =  Z^+Z2+. . -+Zn,  where  the  Zs  are  independent  Bernoulli  random  variables, 
with  E(Z^)  =  p^.  Let  X  be  a  (regular)  binomial  random  variable  with 
parameters  n  and  p,  where  p  =  n~'I'(p1+. .  .+p  ) . 

Lemma  2 . 7  If  we  denote  the  4^  central  moment  of  a  random  variable  W 

4 

by  y1+(W)  (that  is,  p^(W)  =  E{(W-yw)  },  where  pw  =  E(W)  ),  then 

y4(Y)  «  y  (X)-.  § 

(In  words:  a  generalized  binomial  distribution  has  smaller  fourth 

central  moment  than  that  of  the  "equivalent"  regular  binomial  distribution.) 

Proof  The  proof  will  make  use  of  the  concept  of  Schur  convexity 
(cf.  [25] ) .  Tedious  calculation  shows  that 
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and 


n  4  4 

y  (Y)  =  £  (p.q.+q.p.)  +  3  £  p.q.p.q.  , 

”  i=l  1111  i^j  1  1  3  3 


n  -4  -  -2-2 

yu(X)  =  £  p  q+3  £  pq  ,  where 

i=l  i^j 

n 


q  =l-p  (i=l,...,n)  and  q  =  1-p  .  Let  S  =  {peR  :  O^p.^l},  a  convex 
li  i 

subset  of  Rn,  and  consider  f:S  -*  R  defined  by 

n 


f(p1,...,pn)  =  .2  [p?(l-Pi)  +  (l-pi)  pj 

+  3  £  p.(l-p.)p.(l-p.)  • 

Vi  1  3 

Clearly  f  is  symmetric  in  its  arguments.  Then  by  result  D.a.7.a  of 
Olkin  and  Marshall  (cf.  pp.  8-9  of  jj2 5 J  )  3  to  show  f  is  Schur  concave 
it  suffices  to  show 


(2.46) 


where  f  (p)  = 

00  ~ 


(f(k)(p)  ~  (P,,-PB)  <  0  all  peS, 


(£) 


k  *%' 


3f  (p) 
3Pv 


We  now  show  this.  A  straightforward  calculation 


shows 


(2.47) 


f(k)(p)  =  1  -  14pk  +  36pk  -  24pk  +  T(6-12pk)  , 


n 


where  T  =  £  p.d-p^)  .  This  in  turn  yields 


i=l 


(f(k)(?>  -  ‘vV  =  -<vpt)  'Q> 


2  2 

where  Q  =  12T  +  14  -  36(pk+p^)  +  24(pk  +  Pk?^  +  Pk  )  .  Thus  to  show 
that  (2.46)  obtains  it  suffices  to  show  Q  >.  0.  Clearly 
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T  ^  Pk(l-Pk)  +  PAd-PA)  =  Pk  +  P£  -  Pk  -  pJ  ,  so 

Q  >  12(pk+p£)  -  12pk  -  12p^  +  14  -  36(pk+p£)  +  24(p^+pkp^+p^) 

=  14  -  24(pk+p^)  +  12(p^+p£)2 

=  12  (p  +P0-D2  +  2  >  0  . 

k  x- 

Hence  f  is  Schur  concave  on  S,  so  by  Theorem  D.a.7  of  [25],  if  x-<y 
on  S,  then  f(x)  f(y)  (cf.  [25]  for  notation).  But  it  is  well 
known  that  (p.23  of  [25]) 

(P»Ps***»P)'^  (P1»P<5»*,*jP  )  s  s<~> 

l  n 

f(p,p,...p)  ^  f(p19p2,.  . .  ,pn)  ,  implying  y  (X)  ^^(Y).  § 
"til 

Lemma  2 . 8  Denote  the  k  factorial  moment  of  a  random  variable  W  by 

(k) 

y£  (W)  (that  is,  y£  (W)  =  E(W  )  =  E  {W(W-l) • • • (W-k+1)}  ).  Then 

yk  (Y)  yk  (X)  .  § 

W 

Proof  If  W  assumes  values  0,1,2,...  define  <j)(t)  =  E(t  ).  Then 

yk  (W)  =  E(W^^)  =  (j>^\l),  where  is  the  k^ 

derivative.  Simple  calculations  yield  <f>x(t)  =  (q  +  pt)n  and 

(1)  =  n^)  p\  where  n^^=  n(n-l)  ■  *  *  (n-k+1) .  Also 

n 

<jv  ( t )  =  n  (q  +p.t)  . 

i=l  i  1 

(k) 

To  deal  with  (J>  (1)  we  introduce  the  elementary  symmetric  functions. 
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We  again  denote  the  column  vector  (p  ,...,p  )  e  Rn  by  p  ;  let 

In  ~n 

n  -th 

E  (p  )  denote  the  k  elementary  symmetric  function  of  n  arguments. 

K  ^  T1 

Claim:  E(Y(k))  =  ^^(l)  =  k!  e£  (pn)  . 

Proof:  We  will  prove  this  claim  by  a  double  induction  on  k  and  n  with  k<n 
(k=0,l,2,...  and  n=l,2,...  ).  For  k=0  and  arbitrary  n, 

^k)  (1)  =  <M1)  =  1,  and  E"  (p  )  =  1, 

Y  1  x  ~n 

verifying  the  claim.  Now  suppose  the  claim  has  been  verified  for  all 
(k',n')  pairs  with  k'^:k  and  n'.<n.  We  verify  the  claim  for  k'=k  and 
n’=ntl: 


Denoting  f(t)  =  qn+1  +  pn+1  t  and  p^  =  (pa , .  .  .  ,Pn,Pn+1 )  >  we  have 
=  4>n( "t )  f(t).  Thus 

(2.48)  4>n+±  (t)  =  E  (k)  <j>(k_£)(t)  f(S,)(t)  . 

£=011'  n 

But  f^°^(t)  =  f(t),  f^\t)  =  Pn+-^  >  and  f^^(t)  =  0  for  £  greater  than  one. 

Thus 

(k)  (k)  ,  (k-1) 

♦n+1  (1)  =  (1)  +  kpntl  *n  (1) 

=  klI\.(?n)  +  Pntl  Ek-1  <EI1)3 

But  from  a  simple  picture  it  is  clear  that 

Ek+1  (?n+l)  ‘  +  Pn+1  EIU  <?„>  i 

to  complete  the  induction  we  need  to  show  that  for  k=l,2,...  that 
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(v  \  k  v  k-1 

♦  Jj.  (1)  =  k!  Ek  (pk).  But  for  pR  ^(t)  =  p^-^t  +  (c±t  +...+ck>, 

(k)  s 

where  here  c^,...,ck  are  just  some  constants.  Thus  <j>k  (1)  =  k!p^-,-pk 
which  equals  k!Ek  (pk)  •  Thus  the  claim  holds. 

(k)  n  - 

Since  E(X  )  =  k!  Ek  (p,...,p)  ,  to  conclude  the  proof  of  the  lemma 
we  need  to  show  that 

(2.49)  Ek  (p,..,p)  E"  (pi5...,pn)  . 

One  possible  approach  to  this  is  to  use  a  theorem  due  to  Marcus  and 

Lopes  (cf.  p.33  of  JV]  ) ;  but  a  simpler  proof  is  to  again  use  results 

from  Olkin  and  Marshall  about  Schur  convex  functions.  By  result 

D.d.3  on  p.31  of  {[25],  e{{  (  x)  is  symmetric  and  Schur  concave  on 

K  ~ 

{x:  x=(x  ,...,x  )}.  Thus  -Ev  (x)  is  symmetric  and  Schur  convex,  so 
~  ~  1  n  k  ~ 

by  Theorem  D.a.7  of  Olkin  and  Marshall  we  again  have 

-Ek  (x)  <:  -e£  (y)  if  x<y  in  R  . 

Again  noting  (p, .  .  .  ,p)-c  (p  , . . .  ,p  )  ,  we  obtain  e{[  (p, .  .  .  ,p)  >,  (p  , . .  .  ,p  ) , 

in  K  Kin 

thus  verifying  (2.49)  and  completing  the  proof.  § 
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CHAPTER  3. 


ADAPTING  JAECKEL'S  ESTIMATOR 


3 . 1  Adaptive  estimators  and  the  kink  family 

The  ultimate  goal  of  the  work  on  regression  problems  which  we 
are  considering  is  to  find  a  method  of  estimation  which  is  asymptotically 
efficient  (in  the  sense  of  minimizing  the  asymptotic  variance)  -  and 
which  gives  excellent  results  in  small  samples  for  a  wide  spectrum 
of  error  distribtuions .  The  experience  with  the  location  problem 
seems  to  indicate  that ,  although  the  goal  may  be  technically  not 
obtainable,  in  spirit  one  may  be  able  to  come  close,  and  the  methods 
and  insights  derived  in  its  pursuit  are  very  useful. 

In  either  the  regression  or  location  problem,  if  one  uses  L, 

M,  or  R  estimators,  then  once  can  find  an  asymptotically  optimal 
estimator  (under  certain  conditions)  —  assuming  one  knows  the  error 
cdf  F.  But  as  was  noted  in  the  introduction,  the  form  of  the  error 
distribution  is  seldom  known.  A  reasonable  idea  for  circumventing 
this  obstacle  is  to  somehow  estimate  the  unknown  error  distribution 
and  proceed  from  there.  An  early  implementation  of  this  idea  to  actually 
construct  usable  estimators  in  the  location  problem  was  by  Jaeckel 
in  |jL7j  .  We  briefly  consider  his  approach  since  ours  is  very  closely 
related.  Jaeckel  began  with  a  family  of  estimators  for  the  location 
parameter:  namely  a-trimmed  means,  with  the  trimming  proportion 
ae[ctQ,a^  ( 0<aQ<a^<ig) .  Two  characteristics  of  this  family  are  of  note: 
it  is  easily  parameterized,  and  for  a  wide  variety  of  error  distributions 
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there  is  a  member  of  the  family  which  does  reasonably  well.  However 
the  trimmed  means  are  usually  not  asymptotically  optimal.  Also  of 
great  importance  is  the  fact  that  the  asymptotic  variance  a2 (a)  of 
the  a-trimmed  mean  is  a  fairly  simple  function  of  F  and  f=F';  this 
means  that  one  has  some  hope  of  estimating  a2 (a)  on  the  basis  of  a 
moderate-sized  sample.  This  is  exactly  what  Jaeckel  does:  he  constructs 
an  estimator  s  (a)  of  a2 (a).  As  his  adaptive  estimator  he  then  chooses 
that  trimmed  mean  (with  ae(aQ,aJ)  which  has  the  smallest  estimated 
asymptotic  variance.  He  is  then  able  to  show  under  certain  conditions 
that  the  asymptotic  variance  of  his  adaptive  trimmed  mean  for  a  given 
error  distribtuion  is  the  same  as  that  of  the  best  trimmed  mean  for 
that  error  distribution  (with  ae [ao»ai]  ) •  (Also  see  Johns  [lsQ  for 
a  more  ambitious  adapting  scheme  in  the  location  problem. ) 

The  family  of  estimators  for  the  vector  of  slope  parameters  g 
we  wish  to  consider  are  the  Jaeckel  regression  estimators  (cf.  Section 
2.1)  with  monotone  score  functions  J(u)  given  by 

u<3g-£ 

(3.1)  Jf(u)  = 

/  u-h 

5  n>h+K 

with  £e  [o,Js|  (see  Figure  3).  They 
resemble  the  "Wilcoxon"  scores 
J(u)  =  u-3g  ,  except  that  at 
h-E,  and  h+E,  they  have  "kinks," 

FIGURE  3 

beyond  which  they  are  horizontal. 
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We  will  refer  to  them  simply  as  kink  scores. 

A  simple  calculation  using  Jaeckel's  theorem  (see  Section  2.1) 
gives  the  asymptotic  covariance  matrix  of  the  estimator  with  score 
function  J  (u)  proportional  to  (that  is,  excluding  the  factor  E  ) 
the  asymptotic  "variance" 

0.2)  v(£)  =  -4g3+3g2 

12  [/°  f2(t)dt]2 

We  note  that  the  dependence  of  v  on  F  is  confined  to  the  denominator 

and  that  it  is  a  relatively  nice  function  of  F.  This  introduces  the 

important  question  of:  why  introduce  this  kink  family  (or  other  such 

simple  families)  when  one  might  instead  think  of  simply  estimating 

the  optimal  score  function  <j)^(u)  =  (-f  '/f)  (F-1(u) ) .  In  other  words, 

why  be  concerned  with  choosing  the  best  estimator  from  a  small  family  — 

which  contains  optimal  scores  for  only  a  few  distributions  —  when 

one  can  estimate  the  optimal  score  function  and  base  one's  estimator  on 

that.  The  answer  lies  of  course  in  the  virtual  impossibility  of 

-1 

getting  a  good  pointwise  estimator  of  -f'/f(F  (u)),  a  quantity  involving 

the  second  derivative  of  F.  The  sample  sizes  necessary  are  simply 
prohibitive.  By  reducing  consideration  to  the  kink  family,  we  have 
simplified  the  problem  to  basically  one  of  estimating  the  integral  of 
f  ,  which  we  have  some  hope  of  doing  reasonably  well  with  moderate 
sample  sizes. 

To  estimate  v(£)  we  begin  with  a  preliminary  estimate  g^  of  g 
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satisfying  certain  conditions  (in  practice  we  will  just  use  the 
Jaeckel  estimator  with  Wilcoxon  scores).  Then  based  on  the  residuals 
from  this  preliminary  fit  we  will  use  standard  techniques  to  construct 
an  estimator  f^(t)  f(t)  and  an  estimator  (t)  of  F  (t).  These 
give  us  an  estimator  v^(g)  of  v(£).  We  then  choose  as  our  estimator 
basically  that  kink  estimator  g*  which  has  the  smallest  estimated 
asymptotic  "variance".  More  precisely,  because  of  technical  reasons, 
we  consider  only  a  finite  set  of  valuse  of  E,  equally  spaced  on 
(jo’&J  (°<£0<Ci<i5)  •  As  far  as  results  are  concerned,  this  simplifi¬ 
cation  is  minor:  the  estimator  obtained  in  this  manner  is  shown  to 
have  asymptotic  efficiency  (relative  to  the  best  kink  estimator  with 
ge[C  arbitrarily  close  to  1.  The  next  section  gives  the  proof 

of  this  result. 


3.2  Assumptions  and  Bickel-Rosenblatt  result 

In  this  section  we  outline  the  assumptions  required  to  establish 
our  results  and  state  the  result  of  Bickel  and  Rosenblatt  [7]  which 
is  fundamental  to  our  technique. 

The  model  we  are  considering  is 

T 

(3.3)  Yi  =  cj,§  +  ei  i=l,...,N  , 


where  the  e  are  iid  random  variables  with  cdf  F  and  density  f .  We 
i 

assume  we  have  a  preliminary  estimate  8^  of  3,  satisfying 
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(i)  3n  is  invariant  (cf.  p.  1452  of  [is]  ) 

-Jg 

( ii )  3  =  3„+0  (N  ),  where  is  the  true  value 

~N  ~u  p  ~u 

of  g. 

Because  of  the  invariance  we  assume  without  loss  of  generality  that 

-k 

3n=0,  so  3  =  0  (N  ). 

~  ~N  p 

As  estimates  of  the  unknown  errors  we  use 

T  ~ 

(3.4)  e  =  Y.  -  c.  3  j=l, . . . ,N  . 

j  3  ~3  ~N 

From  the  assumptions  on  3  5  it  follows,  since  we  will  assume 

~N 

_k 

|c  I^B  ,  that  e.  =  e.  +  0  (N  2)  uniformly  in  j.  Our  estimate  of  the 
~3  c  3  3  p 

unknown  density  f  is  constructed  from  these  residuals  using  the 
density  estimator  of  Bickel  and  Rosenblatt: 

-1  N 

(3.5)  f(x)  =  (Nb(N) )  l  w( (x-e . )/b(N) )  , 

N  3=1  1 

where  b(N)  is  a  bandwidth  going  to  0  as  N-*»,  and  w  is  a  weight  function. 
We  also  define  a  density  estimator  based  on  the  true  errors: 

-1  N 

(3.6)  f  (x)  =  (Nb(N) )  Z  w( (x-e . )/b(N))  . 

N  j=l  : 

Our  assumptions  are: 

Al.  w  is  symmetric  about  0  with  /  w(x)dx  =  1.  There  is  a  finite 

R 

constant  A  such  that  w  vanishes  outside  Q-A,a] ;  also  w  is  bounded: 

|w(x)|-$Bw.  w  has  a  bounded  derivative  w'  on  (-A,A)  with 

|w'(x)|<:  B  ,  .  Also,  for  (x,x+6)  C  (-A,A)  ,  w(x+6)  =  w(x)+w'  (x)Sto(  62 ) 
w 
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uniformly  in  x. 


A2.  The  density  f  is  continuous,  positive,  bounded,  symmetric,  and 
unimodal.  (Without  loss  of  generality  we  can  assume  f  is  symmetric 
about  0 . ) 

h 

A3.  The  function  f  is  absolutely  continuous  and  its  derivative 
f'/f  is  bounded  in  absolute  value.  Moreover 

3/2  ^6 

/  |z|  [loglogjzQ2  [|w'(z)j  +  |w(z)|]  dz  <  °°  . 

01*3] 

A4.  The  second  derivative  f"  of  f  exists  and  is  bounded. 

-2/Q  -Jfi  k  ,  k 

A5.  b(N)  =  o(N  a)  and  N  (logN)2  (loglogN)  =  o(b(N))  as  N-*=°. 

A6.  |c. |  <c  B  for  j=l,2,... 

~1  c 

We  note  that  in  our  applications  we  generally  will  use  the 
"natural"  weight  function 

(h  !  t  |^i 

(3.7)  w(t)  =  / 

0  otherwise 

which  easily  satisfies  A1  and  the  latter  part  of  A3.  Assumptions 
A3,  A4,  and  A5  are  of  a  technical  nature  which  Bickel  and  Rosenblatt 
require  for  their  results. 

Under  Assumptions  A1-A6  the  following  result  obtains: 

Theorem  3 . 1  (Bickel  and  Rosenblatt  [7] ) 

The  quantity 
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b(N)  2  [Nb(N)/  [f  (t)-f  (t)|] 2  a(t)dt  -  /f(t)a(t)dt*  /w  (z)dz] 

is  asymptotically  distributed  N(0,  2/(/w(x+y)w(x)dx)2dy‘ /a2(t)fz(t)dt  ) 
as  where  a(t)  is  a  bounded,  piecewise  smooth  integrable  function.  § 

3 . 3  Preliminary  results 


In  order  to  estimate  the  asymptotic  variance  v(£)  for  different 

values  of  we  need  to  estimate  /°  f2(t)dt  .  As  indicated, 

F_1(^-C) 

to  estimate  the  integrand  we  use  f^(t )  .  To  estimate  the  lower 
endpoint  of  integration  we  use  the  inverse  of  the  empirical  cdf  based 
on  the  residuals: 

F“1(t)  =  inf ( e ( ^ ^  :  i/N^t}. 


Because  of  the  restriction  in  Bickel  and  Rosenblatt  that  a(x)  be 
integrable ,  their  result  cannot  be  applied  without  restricting  the 
range  of  integration  [F  ^ ) , o]  .  We  thus  consider  the  interval 
of  possible  E,  values  defined  as  I  =  [0,£  ^  ,  where  E,^<h  is  arbitrary 
(in  application  taken  to  be  close  to  h) .  Then  the  following  result 
relating 


/0  1 


f2(t)dt 


and  our  estimate  of  this  quantity  based  on  the  residuals  {e.}  obtains: 


Theorem  3.2  Under  Assumptions  A1-A6 


sup  I  [”  / 0  f2  (t)dtl2  -  [70  i  f2(t)dt]2|  =o  (N  4) .  § 

F^CJs-?)  N  F-1(3g-0  '  P 
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We  begin  by  establishing  several  lemmas ,  from  which  the  proof 
of  the  theorem  will  easily  follow.  For  completeness  we  start  with 
a  very  useful  lemma  due  to  Jaeckel  on  the  behavior  of  order  statistics. 

Lemma  3.1  (Jaeckel  [l7] )  Let  X.j_,...,Xn  be  iid  random  variables  with 
cdf  G.  Suppose  G  is  symmetric,  has  density  g,  and  that  there  are 

numbers  otQ>0,  eQ>0,  and  gQ>0  suc^  "that  g(x)^gQ  for  all  x  such  that 

-1  -k 

a0-e0  <  G(x)  l-(a0-e0)  .  Then  X^  -  G  (i/(n+l))  is  0^(n  2) 

uniformly  in  i  =  [aon]+l, —  ,  n-[aQn]  .  § 

We  note  that  if  g  is  unimodal,  then  Lemma  3.1  is  satisfied  for  any 

v°* 

For  convenience  define  I*  =  [_F  (5g-5^),cT].  Recalling  the 

definition  of  fjj  from  (3.6),  we  have 

Lemma  3.2  f(x)=0(l)  uniformly  for  xel*.  § 

-  N  P 

Proof  Since  w  is  bounded  it  suffices  to  show 

(3.8)  sup  #{e  :  |x-e  I  x  Ab(N)}  =  0  (Nb(N))  . 

xel*  j  j  P 

Let  xel*.  Lemma  3.1  implies  that  given  e*>0,  there  exists  M  such  that 
for  all  N 

(3.9)  P{|e(i)-F~  (i/(N+l))|<<  MN-'5  for  all  i=  [?N/4]  ,  [cN/4]+l , . . . , 

[(1-  £)N]}  >  1-e*  , 

where  z= 

Let  Q  denote  the  exceptional  set  (of  probability  <e*)  where  these 
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inequalities  may  be  violated  (note  that  Q  does  not  depend  on  x) . 

In  passing  we  note  that  the  assumptions  about  the  bandwidth  b(N) 

_v 

certainly  imply  b(N)  =  o(l)  and  N  2  =  o(b(N)).  Now  consider 
such  that  |x-e^ j ^ |^Ab(N) .  Then  for  all  N  large 

F_1(?/2)  e(j)  ^  F-1  ( 1-  JL), 

~1 

since  F  (^)^:x<0  and  Ab(N)->-0  as  N-x».  Thus,  for  Q,  we  have  by 

(3.9)  that  for  all  N  sufficiently  large 

je{[cN/4],  Q;N/4]+l, . . . ,  [(1-  iL  )N]  }  certainly;  so  for 

mefh'Q, 

-1  -lx 

|x-e^ |-<Ab(N)  only  if  |x-F  ( j/(Ntl))  |^Ab(N)+MN  2  for 
N  large.  But  for  N  large  Ab(N)  +  MN  ^  2Ab(N)  ;  and  the  number  of 
jS  for  which  |x~F  "^ ( j  / ( N+l ) )  |  .$  2Ab(N)  is  just  the  number  of  jS 
such  that 

F(x-2Ab(N) )  <:  ^  <  F(x+2Ab(N)) 

N+l 

which  is  no  greater  than 

(N+l)  (F(x+2Ab(N) )  -  F(x-2Ab(N) ) ) 

2  +  Nf(0)  •  4Ab(N)  (unimodality  of  f) 

0(Nb(N))  uniformly  in  x. 

Thus  we  have  shown  that 

sup  #{e.  :  |x-e. |^Ab(N)}  =  0  (Nb(N) )  , 
xel*  1  1  P 

implying  the  result.  § 
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Lemma  3 . 3 


sup  |fN(x)  -  fN(x)|  =  o  (N  .  § 

xel* 


Proof  Let  xel*.  Then 


-1  N  i 

(3.10)  | fN(x)-fN(x) |  ^  (Nb(N) )  E  w 

i=l 


x-ej+(erej) 

b(N) 


-  w I  x~ej  ' 


b(N)  •/ 


Let  T  denote  the  indices  for  which  the  arguments  of  w  on  the  right 
hand  side  (RHS)  of  (3.10)  are  both  in  Q-A,aJ  or  both  outside  of 
Q-A,A]  ;  i.e.  T  =  T^U  ,  where 


T1  =  X  and  x-ej  e[-A,A]}  , 


b(N) 


b(N) 


T„  =  ( j :  X  ej  and  x  ej  e[-A,AjC}  .  (Note  that 


b(N) 


b(N) 


T^,T2,  and  T  depend  on  x  and  u.)  Since  w  is  zero  on  [-A,a]C  and 

w(y+6 )  -  w(y)  =  6w’(y)  +  o(<52)  uniformly  for  y,  y+6e[-A,A],  and  since 
-3* 

e.-e.  =  0  (N  )  uniformly  in  i  (independent  of  x),  we  obtain  that  the 
3  3  P 

RHS  of  (3.10)  is  less  than  or  equal  to 


(3.11)  (Nb(N) )  1  E  {0 

T.  P 


N  ‘ 

\  bW)) 


iw  I  xIfi 


b(N) 


+  o  ( 1/ (Nb2(N) ) ) }  + 


E  B. 


Nb(N)  Tc 


w 


where  0  and  o  are  uniform  for  x£I*  and  in  j.  Since  Iw'I^B  ,=0(1), 
p  p  J  1  w' 


(3.12)  RHS  of  (3.10)  <  0 


N3/2b2(N) 


.  #(T.)  +  o 
1  P 


i _ \.  #(T1)  + 


\  Nb(N) 


N2b3(N)' 

.  #(T°)  . 
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Now  #(T1  )  is  certainly  less  than  or  equal  to  #{e.:  x-e .  ^Ab(N) } . 


By  (3.8)  this  latter  quantity  is  Op(Nb(N))  uniformly  for  xel*.  Thus 

sup  #(T  )  =  0  (Nb(N))  .  The  assumptions  for  b(N)  imply  that  for  N 
xel*  1  P 


sufficiently  large,  b(N)  5,  N  (logN)  ,  so  for  all  N  large  the  first 

term  on  the  RHS  of  (3.12)  is  o^(N  )  uniformly  for  xel*.  Similarly 

the  second  term  is  o^(N  2)  uniformly  for  xel*.  To  evaluate  the  third 

term  we  need  to  bound  #(TC)  uniformly  for  xel*. 

Let  e>0  be  given  and  consider  a  fixed  N.  Then  there  is  Me 

(independent  of  N)  and  a  subset  J  of  Q,  with  P{J,T}  >  1-e,  such  that 

N  N 

| e^ -ej |<MeN  2  for  meJN  and  j=l,2,...,N.  Since  for  N  sufficiently 


large  b(N)^N  ^  (logN)2  ,  on  we  have 


erej 


b(N) 


N^(logN)^ 


for  all  N  large,  j=l,...,N.  Thus  for  N  large  and  weJ. 


N 


(3.13)  sup  #(T  )  <  sup  #  Je.:  x-e.  e 

xel*  \  ^ 


xel” 


b(N) 


-A- 


Mc 


N^dogN)'1 


or  x-e .  e 

- - 1 

b(N) 


A- 


Mc 


N^dogN)1'2 


-A+  Mf 


N  (logN) 


,  At 


N  (logN)  J 

By  the  same  reasoning  as  in  the  proof  of  (3.8),  the  RHS  of  (3.13)  is 


l 


Nb(N)Mp 


\  N^dogN Y*j 


.  Since  e>0  was  arbitrary 
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we  conclude 


sup  #(TC)  =  0  (  Mb(N)  \  . 

XeI*  P  \  N*(logN  )A 

-V 

Thus  the  third  term  on  the  RHS  of  (3.12)  is  o^CN  i*)  uniformly  in 
xel*,  and  so  the  RHS  of  (3.12)  is 

Op(N~'li)  +  Op(N  2)  +  Op(N  )  =  Op(N  <i) ,  uniformly 
in  xel*.  Tracing  the  inequalities  back  we  have 

sup  |f„(x)-f  (x)|  =  o  (N  )  ,  concluding  the  proof. § 

xel*  P 

Corollary  3.1  fN(x)  =  0^(1)  uniformly  for  xel*.  § 

Proof  This  follows  immediately  from  Lemmas  3.2  and  3.3  .  § 

~  2  2 

Corollary  3.2  sup  |/°  f„(t)dt  -  /°  fM(t)dt|  =  o  (N  ).§ 

Cel  F -Hh-O  F“1(VC)  P 

Proof  Consider  a  fixed  Cel.  Note  F-^ (^~C)>F  1(%-C^)  *  By  Lemma 


3.3 

sup 

tel* 

lpN(t)-fN(t)l  = 

-h 

Op(N  4)  and  by 

Lemma  3 . 2  and  Corollary 

3.1 

sup 

tel* 

lfN(t)+fN(t)l  = 

0(1),  implying 

sup 

tel* 

lPN(t)  "  fN(t)l 

= 

From  this  the  result  follows.  § 


Lemma  3.4  sup  IT/0,,  f„T(t)dtl 2- [7°  fxr(t)dt”|2|  =  o  (N  ^)  .  § 

-  Cel  rHhrO  T-HhrO  N  P 
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Proof  Let  us  denote  the  first  integral  as  a(g)  and  the  second  as 
b(g),  so  the  quantity  of  interest  is 

sup  |a2(g)-b2(g)  j  =  sup  { |a(g)-b(g) | • |a(g)+b(g) | }  • 
gel  ?el 

From  Corollary  3.2  we  see  it  suffices  to  show 

(3.14)  sup  |a(g)+b(g) |  =  0  (1)  . 

Cel 

9  ~  2 

But  Lemma  3.2  and  Corollary  3.1  imply  fM(t)  and  f  (t)  =0  (1)  uniformly 

N  p 

for  tel*,  so  a(g)  and  b(g)  are  0(1)  uniformly  for  gel  and  so  (3.14) 
follows .  § 

Lemma  3.5  sup  |  f/°  f2  (t)dt"|2  -  [7°_  f2(t)dt~|  |  =  o  (N  3,/8).§ 

gel  F-1(3g-g )  F"1(3g-e)  '  P 

Proof  First  we  show 

(3.15)  /»  [f„(t)-f(t)]2dt  =  o  (n“3/4)  . 

f  (Vq)  N  p 

By  the  result  of  Bickel  and  Rosenblatt  (Theorem  3.1), 

(3.16)  b(N)  2[Nb(N)/ (f^(t)-f  (t)"]  2a(t)dt  -  /f(t)a(t)dt  •  /w2(z)dz]]  =  0  (1). 

Consider  a(t)=l(t)  ;  we  note  this  choice  satisfies  the 

[F"l<vq),o] 

assumptions  of  Bickel  and  Rosenblatt.  Then,  since  w  is  bounded  and 
vanishes  off  Q-A , a] ,  we  can  assert  that 

2 

/f(t)a(t)dt  •  / w  (z)dz  =  finite  constant. 

j. 

Also  since  b(N)  =  o(l),  b(N)2  =  o(l),  so  that  (3.16)  implies 
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Nb(N)  /° 


F"1(3S-51) 


ff..(t)-f  (t)l  ^dt  =  constant  +  o  (1)  =  0  (1) 
^  In  j  D  n 


and  hence 


(3.17) 


fO 

LN 


[f  (t)-f(t)]2dt  =  0  ( 


Nb(N) 


)  • 


x.  5" 

Again,  for  all  N  large,  b(N)  N  ^(logN)2  ,  so 


-1  -3/4  _!< 

(Nb(N))  ^  N  (logN)  2  for  all  N  large, 

which  combined  with  (3.17)  implies  (3.15). 

We  now  fix  £el  and  let  |  |g|  L  =  [^°i  g  (t)dtl  ,  the 

*  f"  (h-O 

standard  norm  on  the  space  of  square  integrable  functions  defined  on 
\j~^(k~Z)  ,6}  •  Then  the  quantity  of  interest  in  the  lemma  (neglecting 


the  sup)  is 
(3.18) 


|  f  | L  -  | | f | | c  ,  which  is  equal  to 

N  ^  £ 


S  ' I fN>  'z 


|+l lfl l|l lfN! Ie+I lfl l5l |fN! l| 


+  f. 


N1  'K 


Again  using  the  fact  that  f  (t)  =  0  (1)  uniformly  for  tel,  and  using 

N  p 

f<Bf,  we  have  ||fN||^  =  0  (1)  and  ||f||^  =  0(1)  ,  implying  that 


(3.19) 


f  4  -  f 
N  £ 


lfNll5  •  op(D  . 


Also,  |  |  a. ]  |  =  j | (a— b)+b | [  <  |  | a— b j  |  +  ||b||  by  Cauchy-Schwarz,  implying 
|  [ a— b |  j  j  [ a  |  j  -  J  |b|  |  .  Similar  reasoning  also  yields 

l|a-b||  *  INI  -  IMI  • 


so 


^  j | a-b| |  .  Thus 
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For  any  Cels  ||g||g  .$  jjg||g  ;  combining  this  with  (3.20)  and 
(3.19),  we  obtain 


sup 

5el 


[/»  fj  (t>  dt] 

f  x(h-0 


[/o  f2(t)  dt] 

F^Ch-O 


I  lf  f  J  l?i  *  °p(1) 

=  o  (N_3/8)  •  0  (1) 

P  P 

=  o  (N~3/8)  .  § 

P 


(by  (3.15)) 


For  the  random  variables  e^e  , .  . .  ,e^  ,  define  the  inverse 


-1 

empirical  cdf  F^  by 


FN'L(t)  =  inf  {e^j y.  i/N  t}. 


Lemma  3 . 6 


sup  |F_1(35-5)  -  ~F:hh-0\  =  0  (N^)  .  § 
Cel  N  P 


Proof  It  follows  easily  from  Lemma  3.1,  since  for  Cel  , 

that 


(3.21) 


sup  |F_1(3s-5)  -  F'^VOl  -  0  (IT3*) 
Cel  N  P 


-1  ~_1 

Thus  we  consider  the  difference  between  F„  and  F„  .  Consider  a 

N  N 

.  -1  ~-i 

fixed  Cel.  By  our  definition  of  F„  and  F„  ,  we  have 

N  N 


F]1  (h-O  =  e(k)  or  e(k+1)  , 


depending  on  whether  (^-C)N  is  an  integer  or  not,  where 
k  =  [(^-C)N]  (brackets  representing  the  greatest  integer  function); 


also  F.-^C^-E)  =  i/,  n  or  e.,  _  .  .  To  establish  that 
N  UJ  (k+1) 

sup  |F"V2-E)  -  F“1(3S-5  >  i  =  0  (N-^)  , 
Eel  p 

it  suffices  to  show  (for  example) 


(3.22) 


lS(k+l) 


uniformly  for  Eel 


(the  other  cases  follow  similarly).  We  recall  that  e. 
uniformly  in  j .  This  easily  implies 


e.+O  (N  2) 
3  P 


e(j)  = 

- 

Hence  e,.  -  e,,  N  =  0  (N  2) 

(k)  (k)  p 

it  certainly  suffices  to  show 


6(j)  +  Op(N  )  uniformly  in  j. 
uniformly  for  Eel-  Thus  to  obtain  (3.22) 


e(i+l)~e(i)  =  0  *-N  2)  uniformly  for 

i=bN]»bN]+l»***»[(i“5)N]  • 


But  Lemma  3.1  implies 


I  ®(i)  = 
l  e(i+l) 


F-1(i/(N+l) )  +  0  (N  2) 

P 

=  F-1((i+l)/(N+l))  +  0  (N~%) 

P 


9 


uniformly  in  i  in  this  range.  For  all  N  large  and  all  i  in  this  range 


F_1(c/2)  «  F_1(i/(N+1))  <  F_1(l-  ;/2)  certainly, 

so  f(F~'''(i/(N+l)))  >  f  =  f(F_1(E/2))  for  all  of  these  i  by  the 
unimodality  of  f.  Then  by  the  mean  value  theorem  and  unimodality. 
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F  1((i+l)/(N+l))  -  F  1(i/(N+l))  ,<  .  1  =  0(N  1). 

(N+l )fQ 


Thus  (3.22)  obtains,  which,  combined  with  (3.21),  completes  the  proof.  § 
Let  us  define 


(3.23) 


a*(0  =  /°  fB(t)  dt  and 

b*(g)  =  /0  f^T(t)  dt  . 

I  f_1(^-5)  N 


By  Lemma  3.6,  given  e>0,  there  is  M'  such  that  for  all  N  large, 
for  all  gel,  with  probability  at  least  1-  e/2  the  following 
inequalities  obtain: 

1  ~_i  -1 

F^te-g)  »  F  (3s-£1)  and 

~  F-1(3s-£)|  «  M’N"^  . 

\ 

Furthermore,  by  Corollary  3.1  there  exists  M"  such  that 


f  (t)  <  M"  for  all  te[F  1(h~K1),o] 


with  probability  at  least  1-  e/2.  Combining  these  results  we  obtain, 
with  probability  at  least  1-e  ,  for  all  N  large 


(3.24) 


/  2 

| a* ( g )  -  b* ( € )  J  all  gel  and 

/n 

a*(g)  M"^M'  for  all  gel  • 

\ 
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More  briefly,  sup  |a*(g)  -  b* ( g )  | 

eel 


0p(N-Js) 


and 


sup  a*(5 ) 


=  0  (1). 

P 


We  now  use  these  results  to  prove 


Lemma  3 . 7 


sup  | a*2(g  )  -  b*2(g)|  =  0  (N  **)  .  § 

eel  p 


Proof 


From  previous  work  we  know  sup 

eei 


b*(e)  =  o  (i)  . 

p 


Thus 


sup  | a* ( e )  +  b*(g)|  =  0  (1). 

eel  p 


Hence 


sup  |a*2(g)  -  b*2(g ) |  =  sup  [|a*(e)  +  b*(e) | • |a*(e)  -  b*(g)[] 
eel  gel 

=  o  (1)  •  0  (N~^)  =  0  (N-1'2)  .  § 

P  P  P 


We  now  return  to 


Proof  of  Theorem  The  quantity  of  interest  is: 


sup  [[/^  f^(t)dt]2  -  [;°  f2(t)dt]2| 

gel  F^CVe)  F1(3g-g) 

sup  |a*2(g)-b*2(g) | 

eel 


t  sup  |  [/0 

gel  F  Os-g) 

+  sup  |  [/0 

gel  F  (3g-g ) 


f2(t)dt]2  -  [/^ 

f2(t)dt]2  -  Cf° 

F 


fN(t)dt^2| 

(h-O  N 

f2(t)dt]2| 

ih-O 


-hs 


-V  _  q  /  q 

0_(N  ")  +  o  (N  4)  +  o  (N  ) 
P  P 


-k, 

=  °p(N  4),  by  Lemmas  3. 7, 3. 4,  and  3.5  respectively.  § 
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3.4  Asymptotics  for  adaptive  estimator 


We  are  now  ready  to  consider  the  adaptive  estimator  of  3  in 
detail.  We  define  the  estimated  variance  by 


(3.25) 


-4g3  +  3g2 


i2i i”  ,  ytwt]2 

y  (vs) 


Let  an  interval  [*£ n,h\  be  given,  with  F  >0.  Then  the  following 
asymptotic  result  obtains  for  the  adaptive  kink  estimator: 

Theorem  3 . 3  Let  e>0  be  given.  Under  Assumptions  A1-A6  and  the 
assumptions  in  the  statement  of  Jaeckel's  theorem  (see  p.  19),  there 
is  a  6> 0  such  that  the  adaptive  kink  estimator  3*  —  defined  as 
any  value  of  3  which  minimizes  the  dispersion  D^(3)  constructed  with 
score  function  J~  (Jg  defined  in  equation  (3.1)  ),  where  k,  is  any 
value  of  £eE  3  {£0,  C  0+<S  *  £0+26,...,  minimizing  vN(g)  — 

has  asymptotic  efficiency  greater  than  1-e  ,  where  the  asymptotic 
efficiency  is  computed  with  respect  to  the  best  kink  estimator  with 
5e[e0,%]-.  The  grid  set  5  does  not  depend  on  the  unknown  error 
density  function  f.  § 

Proof  Let  f  be  any  density  satisfying  the  assumptions.  We  will 
assume  the  minimum  of  {v(£;):£eS}  is  unique,  and  we  will  denote 
the  minimizing  value  £.  This  is  to  simplify  notation;  it  is  otherwise 
irrelevant  to  the  proof.  We  break  the  proof  up  into  several  parts. 


(i)  For  the  given  e>0,  there  is  6>0  such  that  if  f  satisfies  the 
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conditions  of  the  theorem 

inf  v(£) 

(3.26)  5e[50,%] 

_  >  1-e  . 

v(£) 

Proof  of  (i):  Denote  f°  f2(t)dt  by  1(5).  We  first  show  there 

is  a  constant  K  such  that 

(3.27)  v'  (5)  <<  K  for  all  5e[5Q,3g],  uniformly  in  f. 

v(5) 

An  easy  calculation  yields 

(3.28)  v^_  (5)  =  6  -  125  -  2f(F~1(^-g)) 

V  35  -  452  1(5) 

The  first  term  is  bounded  for  5e{^5q»^]  and  does  not  depend  on  f. 

The  second  term  (neglecting  the  constant)  is 

-1 

(3.29)  f(F  (h-Q)  by  change  of  variables; 

/2  f  (F  1(u))du 
^-5 

-1  -1  -1 

for  f(F  (u))  »  f ( F  (V5))  since  F  is  increasing  and 

f  is  unimodal  and  symmetric  about  0.  Thus  the  integral  in  the  denom- 

-1 

inator  of  (3.29)  is  at  least  5f(f  (^-5))»  implying  the  quantity  of 

-1  -1 

(3.29)  is  at  most  5  ,  which  is  no  larger  than  50  ,  Thus  (3.27) 
obtains.  We  return  to  (3.26).  Since  v  is  continuous  the  infimum  of 
v(5)  on  |j  is  achieved;  label  a  value  at  which  it  is  achieved  5'j 
and  suppose  5  ' e  [5o+r<5 , 5q+fS+<5  )  for  some  r.  By  the  mean  value  theorem 
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there  exists  £*e  [^Q+r6 ,£  such  that 

v(C0+r<S)  -  v(5')  =  (5'  -  50  -  r6)*v'(5*)  , 

with  v'(C*)  >  0  (since  we  can  assume  v(£g+r6)  >  v(£')  ).  Since 
0  <  £'-£0-r<S  <  6  ,  this  implies 

(3.30)  1  -  <?  Sv'Ui!) 

v(50+ r6)  v(50+r6) 

<  6K  v(£*)  ,  by  (3.27)  . 

v(£0+r6) 

To  conclude  our  proof  of  (3.26),  since  v(£g+r6)  >  v(£)  certainly, 

it  will  suffice  to  determine  6  so  that  6Kv(S;*)  <  e  ,  independent 

v( 5g+r6 ) 

of  5*  and  f.  This  entails  bounding  v(£*)/v(gg+r6)  .  We  consider 
the  function 

u(x)  =  v(£Q+r6+x)/  v(5Q+r6)  for  xe[0,6]  say. 

Then 

u'(x)  =  <  Kv({0trstx)  =  Ku(x)  by 

v(50tr-«)  v(C0+r«) 

(3.27).  Thus  u  satisfies:  u(0)  =  1  and  u'/u  (x)  ,<  K  ,  so 

u(x)  4  exp(Kx)  certainly.  Hence  max  u(x)  ^  exp(K6),  which  implies 

xe[0,6] 


v(£0+r<5+x) 


^  exp(K6)  ,  implying 


v(£*)/  vC^q+^S)  ^  exp(K6)  for  any  £*e  QiQ+r6  ,£ ']]  an<3  uniformly  in  f. 

In  other  words 

_ ,L_  4:  6Kexp(K6)  ,  and  choosing 

v(£0+r6) 

6='  e/2K  for  example  makes  6Kexp(K6)  <  e  (if  e  is  small)  .  This 
concludes  the  proof  of  (3.26). 

(ii)  For  any  n>0  there  are  N.  and  sets  A  ,  with  P(A.T)>l-n  ,  such 

U  N  IN 

that  N>Nq  implies  v(£)/  v(£N)  =1  on  A  . 

Proof  of  (ii):  Write  v(.£)  in  (3.2)  as  U(g)/V(£)  and  vN(g)  in 
(3.25)  as  U(g)/V^(g)  .  Then  Theorem  3.2  implies 

(3.31)  VN(C)  =  V(5)  +  Op(N  ^ )  uniformly  in 

Also  there  exist  constants  ,  k^,  k^,  k^  such  that 

0  <  k1  <  U(c)  <  k2 

for  all  Ce[C0»Ci]  , 

and  0<kg<V(£;)<k1+ 

implying  that  v^?)  =  v(g)  +  op(N  )  uniformly  in  ge[g0,5j  • 

Thus  given  n>0  there  are  sets  A  ,  with  P(A^)  >  1-ti  such  that 

-V 

sup  Iv  (5)  -  v(£)|  4  on  A  for  all  N  large. 

r  rv  r  1  N  N 

But  for  all  N  sufficiently  large, 

-k  .  — 

2N  <  min  (v(£)  -  v(£))  ,  which 

implies  that  on  A^  (N  sufficiently  large) 
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VM(5)  <<  min  V  (5)  ; 

N  _  „  rr-.  N 


this  inequality  implies  =  E,  and  thus  the  result . 

Thus,  if  we  define  0  to  be  the  kink  estimator  derived  from 


the  score  function  Je  ,  we  have  0*  =  ft  on  the  sets  A  for  all 

b  ~N  ~N  N 

N  sufficiently  large.  Since  n>0  was  arbitrary  this  implies 


—  -Is 

0*  =  0.T  +  o  (N  and  so  by  Mann-Wald 

~N  ~N  p 


theory  the  asymptotic  distributions  of  0*  and 


0N  are  the  same. 


Specifically  this  implies  the  asymptotic  "variance"  of  0*  is 

~N 

v(g)  ,  which  combined  with  (3.26)  yields  the  theorem.  § 
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CHAPTER  4.  FUTURE  WORK 


4.1  Possible  extensions 


There  are  a  number  of  topics  in  the  area  of  robust  regression 
considered  in  this  paper  which  need  further  study.  Undoubtedly  the 
most  important  theoretical  problem  still  unanswered  is  the  asymptotic 
normality  of  Jaeckel-type  estimators  based  on  a  non-monotone  score 
function  for  both  the  simple  linear  regression  and  the  general  linear 
regression  models.  There  is  every  reason  to  believe  that,  subject 
to  certain  restrictions  (as  indicated  by  the  counterexample  of  Section 
2.3),  a  result  like  Jaeckel's  theorem  (see  Section  2.1)  obtains  for 
such  estimators.  The  extension  of  the  consistency  result  of  Section 
2.2  to  normality  does  not  appear  simple  however,  a  situation  in  contrast 
say  with  the  case  of  maximum  likelihood  estimates.  Some  work  the 
author  has  done  seems  to  indicate  a  plausible  approach  to  the  extension, 
although  the  technique  is  very  complicated. 

On  the  other  hand,  the  extension  of  the  consistency  result  to 
the  case  of  a  vector  parameter  8  is  very  straightforward.  Indeed 


the  basic  proof  of  Section  2.2  continues  to  hold  with  appropriate 


modifications  (for  example,  the  condition  |c.|  ^  B  for  the  regression 

3  c 

constants  becomes  |c. |  4  B  where  |*|  is  now  the  Euclidean  norm; 

M  J  c 

similarly  the  compactness  condition  for  the  set  of  possible  parameter 
values  B°  is  now  that  B°  should  be  compact  in  R^,  and  so  forth). 
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A  further  extension  of  the  consistency  result  would  result  from 
weakening  the  boundedness  conditions  on  the  {c^}  so  as  to  allow 
|c.J-*»  at  a  sufficiently  slow  rate  (for  example  one  might  use  Noether's 
condition:  see  [20l).  Obviously  one  should  also  be  able  to  weaken 
some  of  the  technical  conditions  on  f  (Assumption  F2)  and  the  score 
function  J  (Assumption  J2).  These  extensions  would  be,  very  possibly, 
at  the  expense  of  a  much  more  involved  set  of  proofs.  Nothing  appears 
to  inherently  demand  the  restrictions  we  invoked  however.  The  unimodality 
restriction  of  Assumption  FI  is  a  different  matter,  as  the  counter¬ 
example  indicates.  A  most  intriguing  question  is:  exactly  what  sort 
of  conditions  does  one  need  to  impose  on  the  error  distribution  to 
insure  correct  asymptotic  behavior  for  estimators  using  non-monotone 
score  functions? 

With  regard  to  the  results  on  adapting,  there  are  a  number  of 
important  extensions  that  would  be  very  desirable.  First,  there  are 
obviously  a  number  of  alternatives  to  the  kink  family  which  could  be 
used  for  adapting  and  whose  behavior  might  lead  to  better  estimators , 
especially  for  small  samples,  than  the  one  we  proposed.  Second,  the 
implications  of  extending  asymptotic  normality  to  estimators  based  on 
non-monotone  scores  would  be  very  important  in  adapting,  since  one 
could  then  utilize  more  flexible  families  containing  non-monotone 
members.  In  such  a  case  one  could  realistically  hope  to  construct 
an  adaptive  estimator  whose  asymptotic  efficiency,  relative  to  the 
Cramer-Rao  bound,  would  be  very  high  across  a  very  large  nonparametric 
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class  of  error  distributions,  subject  only  to  regularity  conditions. 
When  such  a  result  is  achieved,  we  will  have  a  reasonable  understanding 
of  the  problem  of  robust  regression. 
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